Dòng tin

1 nội dung mới nhất
Tất cả
DAIR.AI
DAIR.AIXBài đăng·5 ngày trước
Agent Cũng Lão Hóa: Độ Tin Cậy Theo Thời Gian
RT by @dair_ai: // Your Agents are Aging Too // Huh!? They need "sleep," and now they are aging? Joke aside, great write-up on reliable agentic engineering. This new research introduces AgingBench, a longitudinal reliability benchmark. It organizes agent aging into four mechanisms, including compression aging and interference aging, and measures not just whether deployed agents degrade but what form the degradation takes and where repair should target. We benchmark agents on day one and then deploy them for months. That gap hides a basic systems question. How long does an agent stay reliable after deployment? Even with frozen model weights, an agent's effective state keeps shifting. It compresses interaction history, retrieves from a growing memory store, revises facts after updates, and goes through routine maintenance. Reliability becomes a lifespan property of the full harness, not a snapshot of the base model. Paper: https://arxiv.org/abs/2605.26302 Learn to build effective AI agents in our academy: https://academy.dair.ai/
  • Giới thiệu AgingBench, một benchmark đo độ tin cậy của agent theo thời gian dài sau deployment.