Dòng tin
Tất cả
Mô hình ngôn ngữ cần ngủ
// Language Models Need Sleep //
Let your agents "sleep", folks.
On a serious note, this is a fascinating paper on getting the most from long-horizon agents.
Here is the problem with agents today: Attention scales badly with context length, so long-horizon agents keep paying a quadratic tax at inference time.
This work proposes a sleep-like consolidation step instead. The model periodically does N offline recurrent passes over recent context, writes the result into persistent fast weights in its state-space blocks, then clears the KV cache.
The effect is that extra compute moves to sleep while wake-time prediction stays low latency. On cellular automata, multi-hop graph retrieval, and a math reasoning task where a plain transformer and SSM-attention hybrids fail, longer sleep durations improve performance, with the biggest gains on examples that need deeper reasoning.
Why does it matter?
It points at an alternative to ever-larger KV caches for agents that run for a long time. Consolidate, then forget the raw tokens.
Paper: https://arxiv.org/abs/2605.26099
Learn to build effective AI agents in our academy: https://academy.dair.ai/
- ›Agent dài hạn hiện tại phải chịu chi phí bình phương vì attention scales tệ với context dài.