Dòng tin

1 nội dung mới nhất
Tất cả
DAIR.AI
DAIR.AIXBài đăng·6 ngày trước
Mô hình ngôn ngữ cần ngủ
// Language Models Need Sleep // Let your agents "sleep", folks. On a serious note, this is a fascinating paper on getting the most from long-horizon agents. Here is the problem with agents today: Attention scales badly with context length, so long-horizon agents keep paying a quadratic tax at inference time. This work proposes a sleep-like consolidation step instead. The model periodically does N offline recurrent passes over recent context, writes the result into persistent fast weights in its state-space blocks, then clears the KV cache. The effect is that extra compute moves to sleep while wake-time prediction stays low latency. On cellular automata, multi-hop graph retrieval, and a math reasoning task where a plain transformer and SSM-attention hybrids fail, longer sleep durations improve performance, with the biggest gains on examples that need deeper reasoning. Why does it matter? It points at an alternative to ever-larger KV caches for agents that run for a long time. Consolidate, then forget the raw tokens. Paper: https://arxiv.org/abs/2605.26099 Learn to build effective AI agents in our academy: https://academy.dair.ai/
  • Agent dài hạn hiện tại phải chịu chi phí bình phương vì attention scales tệ với context dài.