Dòng tin

2 nội dung mới nhất
Tất cả
DAIR.AI
DAIR.AIXBài đăng·3 ngày trước
Quy Luật Mở Rộng cho Agent Harnesses
RT by @dair_ai: // Scaling Laws for Agent Harnesses // If you build agent harnesses, this one is worth your time. (bookmark it) Most harness tuning treats every token and tool call as if volume is all that counts. New research shows that most of it does not. The work introduces Effective Feedback Compute (EFC), a coordinate that counts only the feedback an agent can actually act on. Raw token and tool-call counts explain agent failure at R2 of 0.33 to 0.42. EFC pushes that to 0.99. Why does it matter? Once you budget by useful feedback instead of raw volume, reallocation alone lifts success from 0.27 to 0.90 at the same compute. This also turns harness design from guesswork into something you can predict. Paper: https://arxiv.org/abs/2605.29682 Learn to build effective AI agents in our academy: https://academy.dair.ai/
  • Giới thiệu Effective Feedback Compute (EFC), một chỉ số chỉ tính phản hồi mà agent thực sự có thể hành động được, thay vì đếm tất cả tokens và tool calls.
DAIR.AI
DAIR.AIXBài đăng·5 ngày trước
Mô Hình Mạnh Hơn Không Cần Harness Phức Tạp Hơn
Stronger models do not always need lighter harnesses. Everyone believes more structured harnesses universally improve reliability, and that higher-capability models need proportionally less structural guidance. Together, that implies a clean inverse relationship between model tier and optimal harness complexity. This new research tests it with a controlled 432-run experiment, six models across four capability tiers crossed with three harness conditions, on a 24-task benchmark with git-based workspace verification. For a frontier chat model, increasing harness verbosity dropped success by 29 to 38 percentage points. They call it the harness-complexity paradox. Paper: https://arxiv.org/abs/2605.26731 Learn to build effective AI agents in our academy: https://academy.dair.ai/
  • Quan sát ngược lại trực giác thông thường: tăng tính phức tạp của harness lại giảm hiệu suất của các mô hình mạnh hơn.