Dòng tin

6 nội dung mới nhất

Tất cả

DAIR.AIXBài đăng·3 ngày trước

Agents chủ động có thực sự cần LLM để quyết định khi nào 'thức dậy'?

Pinned: Do proactive agents really need an LLM to decide when to wake? The default proactive agent calls an LLM on every event just to decide whether to wake up. That is a lot of expensive inference spent on a yes or no. New research from Microsoft and Purdue asks whether the trigger really needs a language model at all. Their answer is a 220MiB temporal-graph encoder that decides when to wake and what context to anchor. It gains +16.7 mean F1 across 14 backbones, runs 4 to 83x faster, and fits on-device at around 11ms per event. If you run an always-on agent loop, the polling decision is quietly the main cost driver. A tiny encoder removes it without giving up accuracy. Paper: https://arxiv.org/abs/2605.30152 Learn to build effective AI agents in our academy: https://academy.dair.ai/

›Các agent chủ động truyền thống lãng phí tính toán bằng cách gọi LLM cho mỗi sự kiện để quyết định kích hoạt.

#AI Agents #Tối ưu hóa hiệu suất #Kiến trúc mô hình

Sebastian RaschkaXBài đăng·11 ngày trước

Cohere Command A+: Mô hình LLM mạnh mẽ tối ưu hóa cho hiệu quả phần cứng

It's been *almost* a bit quiet around LLM architecture releases in the past two weeks 😅 Interesting tidbit is the parallel block design. Via the Cmd-A the tech report "equivalent performance but significant improvement in throughput compared to the vanilla transformer block."

›Cohere phát hành Command A+, mô hình LLM mạnh mẽ nhất được tối ưu để chạy trên ít phần cứng hơn và phát hành open-source.

#LLM #Open Source #Kiến trúc mô hình

Sebastian RaschkaXBài đăng·16 ngày trước

Tổng quan trực quan về các tiến bộ gần đây trong kiến trúc LLM

New article: a visual tour of recent LLM architecture advances, from Gemma 4 to DeepSeek V4. I focus on long-context efficiency tweaks like KV sharing, per-layer embeddings, layer-wise attention budgets, compressed attention, and mHC. Link: https://magazine.sebastianraschka.com/p/recent-developments-in-llm-architectures

›Bài viết của Sebastian Raschka trình bày các tiến bộ kiến trúc LLM gần đây từ Gemma 4 đến DeepSeek V4 qua hình ảnh minh họa.

#LLM #Kiến trúc mô hình #Long-context

Sebastian RaschkaBlogBài viết·16 ngày trước

Những phát triển gần đây trong kiến trúc LLM: KV Sharing, mHC và Compressed Attention

Recent Developments in LLM Architectures: KV Sharing, mHC, and Compressed Attention

›Các kiến trúc LLM mới tập trung vào hiệu quả xử lý bối cảnh dài thông qua KV sharing, per-layer embeddings và compressed attention.
›KV-cache size, memory traffic và attention cost trở thành những ràng buộc chính khi agent workflows giữ lại nhiều token.
›Gemma 4, Laguna XS.2, ZAYA1-8B và DeepSeek V4 áp dụng các kỹ thuật kiến trúc này để giảm chi phí tính toán.

#LLM #Kiến trúc mô hình #Attention mechanism #Hiệu quả tính toán

Sebastian RaschkaXBài đăng·19 ngày trước

Bài học từ việc xây dựng kiến trúc LLM từ đầu bằng Python và PyTorch

A little talk on what we can learn from implementing LLM architectures from scratch in Python and PyTorch. And how I approach new open-weight models, compare them against reference implementations etc: https://www.youtube.com/watch?v=TXzQ7PGpO6w

›Tìm hiểu kiến trúc LLM bằng cách lập trình từ đầu giúp hiểu sâu hơn các cơ chế nội tại.

#LLM #PyTorch #Kiến trúc mô hình

Sebastian RaschkaXBài đăng·22 ngày trước

Tổng quan các thành phần kiến trúc LLM được phát hành gần đây

Back from a little family break! Lots has happened, and I’m planning to do a deeper dive into the most interesting architectural components (soon). Btw, are there any major architectures I missed below?

›Có rất nhiều mô hình và kiến trúc LLM mới được phát hành với những cải tiến thú vị.

#LLM #Kiến trúc mô hình #Tổng hợp