Dòng tin

2 nội dung mới nhất

Tất cả

AK (_akhaliq)XBài đăng·3 ngày trước

StepFun 3.7 Flash: Mô hình MoE đa năng với khả năng agent, coding và multimodal

RT by @_akhaliq: Impressive release by StepFun, explore it at https://paperswithcode.co/paper/83892

›StepFun phát hành Step 3.7 Flash, mô hình MoE với 198B tham số nhưng chỉ ~11B active, đạt 400 TPS với context 256K.

#MoE #Agentic AI #Open weights #Multimodal

Andrew NgXBài đăng·khoảng 2 tháng trước

Voice UI kết hợp giọng nói và cập nhật hình ảnh trong thời gian thực

I'm excited about voice as a UI layer for existing visual applications — where speech and screen update together. This goes well beyond voice-only use cases like call center automation. The barrier has been a hard technical tradeoff: low-latency voice models lack reliability, while agentic pipelines (speech-to-text → LLM → text-to-speech) are intelligent but too slow for conversation. Ashwyn Sharma and team at Vocal Bridge (an AI Fund portfolio company) address this with a dual-agent architecture: a foreground agent for real-time conversation, a background agent for reasoning, guardrails, and tool calls. I used Vocal Bridge to add voice to a math-quiz app I'd built for my daughter; this took less than an hour with Claude Code. She speaks her answers, the app responds verbally and updates the questions and animations on screen. Only a tiny fraction of developers have ever built a voice app. If you'd like to try building one, check out Vocal Bridge for free: https://vocalbridgeai.com

›Voice UI vượt xa voice-only use case, cho phép speech và screen update cùng nhau.

#Voice AI #Multimodal #Agent Architecture

Thu gọn về 7 ngày gần nhất