Dòng tin

3 nội dung mới nhất

Tất cả

Jeremy HowardXBài đăng·5 ngày trước

Anthropic quá đắt, sẽ mất khách hàng hoặc buộc phải hạ giá

RT by @jeremyphoward: Anthropic is too expensive and will either lose customers or cut prices

›Giá dịch vụ Anthropic được nhận xét là quá cao so với giá trị cung cấp

#Định giá LLM #Anthropic #Cạnh tranh thị trường

Jeremy HowardXBài đăng·5 ngày trước

Phía sau giảm giá API MiMo: Tối ưu cache KV và kiến trúc thưa

RT by @jeremyphoward: Behind the MiMo API Price Reduction: The deepest price cut, up to 99%, is for Input (Cache Hit). The core reason is our inference framework now supports hierarchical KV cache optimization for SWA. Production inference engine tests show this optimization increases cached token capacity by 5x, equivalent to an 80% reduction in caching costs. Combined with Cache Read Overlap among multiple Full Attention modules in the Hybrid model, actual costs are further reduced. Prices for Input (Cache Miss) and Output are also reduced by 60%-80%. This mainly benefits from the extreme 1:7 Full:SWA sparsity ratio brought by the model architecture (the prefill compute of the 70-layer MiMo-V2.5-Pro roughly equals a 10-layer GQA model). This kept our original inference costs well below the industry average, naturally leaving a 2x-3x profit margin in pricing. This price adjustment simply reflects our decision to pass these structural cost efficiencies directly to developers. Operating at these newly reduced API prices, our production inference engine is running at near full capacity, and we can still essentially break even. We previously advised LLM companies not to "blindly cut prices" precisely because very few model architectures and inference optimizations can keep API costs from running at a loss. If more architectures that save compute and KV cache emerge, along with better inference Infra to drive down API costs, this will form an excellent virtuous cycle in the industry. More crucially, affordable, high-performance model APIs will drive real, sustained, and at-scale inference demand. This upstream demand pulls forward the development of the entire AI infrastructure chain—including chips, servers, optical transceivers, PCBs, liquid cooling, power, energy storage, and data centers—serving as a strategic fulcrum for a systemic revaluation of AI hardware. In the long run, this injects more affordable and accessible compute into both training and inference pipelines, accelerating the parallel evolution of global AGI across multiple regions and technical routes. For more technical details, we will release a detailed Blog post later.

›MiMo giảm giá API lên đến 99% cho cache token nhờ tối ưu KV cache phân tầng

#Định giá LLM #KV cache #Inference optimization

Jeremy HowardXBài đăng·5 ngày trước

Xiaomi MiMo v2.5 cung cấp giá trị vô cùng tốt

Wow. It looks like the @XiaomiMiMo v2.5 model is insanely good value :O (Price for each prompt shown after each answer. Context includes >40k tool descriptions, system prompt, skills, etc.)

›Mô hình Xiaomi MiMo v2.5 cho hiệu suất chi phí rất ấn tượng với context rất lớn (>40k tokens)

#Giá trị mô hình #Định giá LLM #Performance