Dòng tin

2 nội dung mới nhất
Tất cả
Jeremy Howard
Jeremy HowardXBài đăng·5 ngày trước
DiffusionBlocks: Huấn luyện mạng nơ-ron theo khối độc lập
RT by @jeremyphoward: For over a decade, we’ve accepted that end-to-end backprop is the only way to train deep networks. But holding the entire network in memory all at once is why AI training is hitting a resource wall. We found a new way to break the network into blocks and train them independently. The trick? Treating the network’s forward pass like a diffusion model denoising a signal. This reinterpretation slashes the memory needed to train deep models. In our #ICLR2026 paper (https://arxiv.org/abs/2506.14202), we matched end-to-end performance across ViTs, DiTs, and LLMs. We did this while training just one isolated block at a time.
  • Phương pháp mới cho phép huấn luyện mạng nơ-ron từng khối độc lập thay vì backprop toàn bộ
John Carmack
John CarmackXBài đăng·6 ngày trước
Đề xuất cải tiến hàm ReLU để tối ưu gradient flow
It is easy enough to make your own, but I think standard relu should have been defined as passing the value at zero, so gradients flow backward through it, allowing some things to be zero weight initialized when symmetry breaking isn’t an issue.
  • ReLU tiêu chuẩn nên được định nghĩa để cho phép gradient chảy ngược qua điểm 0.