Dòng tin

4 nội dung mới nhất

Tất cả

AK (_akhaliq)XBài đăng·3 ngày trước

BeliefTrack - Quản lý Niềm tin cho Suy luận Dài hạn của LLM

RT by @_akhaliq: When should LLMs update, preserve, or ignore information? Contextual Belief Management is what long-horizon reasoning was missing. We introduce BeliefTrack—and show that optimizing belief states cuts reasoning failures by over 70%.

›BeliefTrack là framework quản lý contextual belief cho LLM

#LLM #Reasoning #Belief Management

AK (_akhaliq)HF PapersPaper·4 ngày trước

Suy nghĩ trước khi hạn chế: Khung Decoding thống nhất cho Mô hình ngôn ngữ lớn

Thinking Before Constraining: A Unified Decoding Framework for Large Language Models

›Natural generation cho phép LLM sản xuất free-form responses với reasoning phong phú nhưng khó xác minh; constrained decoding đảm bảo định dạng chuẩn nhưng hạn chế reasoning.
›Phương pháp In-Writing kết hợp free-form reasoning và structured generation bằng trigger token để tách biệt chúng.
›Model thực hiện unconstrained reasoning trước, sau đó áp dụng structured decoding khi trigger token được sinh ra, tránh premature triggering.
›Đạt được cải thiện lên tới 27% về độ chính xác so với natural generation trên các bộ dữ liệu classification và reasoning.

#LLM #Constrained Decoding #Structured Generation #Reasoning

AK (_akhaliq)HF PapersPaper·4 ngày trước

RePoT: Khôi phục Program-of-Thought thông qua Sửa chữa Checkpoint

REPOT: Recoverable Program-of-Thought via Checkpoint Repair

›Program-of-Thought (PoT) sinh ra chương trình Python nhưng một lỗi duy nhất có thể làm vô hiệu toàn bộ kế hoạch.
›RePoT xác minh lại kế hoạch, chạy qua môi trường đến lỗi đầu tiên, rồi dùng một lệnh gọi LLM để tiếp tục từ phần đã xác minh.
›RePoT cải thiện +3 đến +11 điểm phần trăm so với PoT trên các mô hình khác nhau, đạt 96.9% so với 86.3%.
›Adaptive RePoT dùng rule-based dispatcher để chọn giữa suffix repair và fresh PoT retry dựa trên verified-prefix length.

#LLM #Program-of-Thought #Reasoning #Error Recovery

François CholletXBài đăng·10 ngày trước

Bước tiến đầu tiên trong cuộc thi ARC-AGI-3: Tufa Labs đạt 1.17%

RT by @fchollet: We saw our first meaningful jump in the ARC-AGI-3 competition today @tufalabs went from 0.68% > 1.17% My notes: - .68% is the score of the best template (which is why so many people have this score) - I'm guessing 1.17% is a novel approach. This also gives them first signal as to what is working. I expect frequent score increases due to this - Tufa Labs has been a serious contestant with ARC Prize for multiple years now. Very cool to see them continuing with V3 - We have a $25K milestone prize at the end of June for the best oss solution. I wonder if they'll open it up

›Tufa Labs đạt bước nhảy vọt từ 0.68% lên 1.17% trong ARC-AGI-3, vượt qua template tốt nhất trước đó.

#AGI #ARC Prize #Reasoning

Thu gọn về 7 ngày gần nhất