Dòng tin
Tất cả
Biên giới hiệu quả của LLMs: Bạn đang trả quá nhiều cho context?
// The Efficiency Frontier in LLMs //
(bookmark this one)
How much are you overpaying for context you do not need?
It turns out that context costs dominate production LLM bills, and the right strategy depends on how often you reuse preprocessing. Modeling that explicitly lets you pick the cheapest point that still hits your target quality.
This work treats context-strategy selection as a deployment-aware optimization problem instead of a fixed choice, using amortized cost modeling across performance, token cost, and preprocessing reuse.
It achieves roughly 25% token savings at equal F1 (around 0.78), and amortized memory compression delivers more than 50% lower token cost versus full-context in high-performance settings. Tested on 5,000 HotpotQA instances.
Paper: https://arxiv.org/abs/2605.23071
Learn to build effective AI agents in our academy: https://academy.dair.ai/
- ›Chi phí context chiếm ưu thế trong hóa đơn LLM production, chiến lược tối ưu phụ thuộc tần suất tái sử dụng preprocessing.