Dòng tin

2 nội dung mới nhất

Tất cả

Jeremy HowardXBài đăng·5 ngày trước

DiffusionBlocks: Huấn luyện mạng nơ-ron theo khối độc lập

RT by @jeremyphoward: For over a decade, we’ve accepted that end-to-end backprop is the only way to train deep networks. But holding the entire network in memory all at once is why AI training is hitting a resource wall. We found a new way to break the network into blocks and train them independently. The trick? Treating the network’s forward pass like a diffusion model denoising a signal. This reinterpretation slashes the memory needed to train deep models. In our #ICLR2026 paper (https://arxiv.org/abs/2506.14202), we matched end-to-end performance across ViTs, DiTs, and LLMs. We did this while training just one isolated block at a time.

›Phương pháp mới cho phép huấn luyện mạng nơ-ron từng khối độc lập thay vì backprop toàn bộ

#Học sâu #Tối ưu huấn luyện #Memory efficiency

John CarmackXBài đăng·6 ngày trước

Đề xuất cải tiến hàm ReLU để tối ưu gradient flow

It is easy enough to make your own, but I think standard relu should have been defined as passing the value at zero, so gradients flow backward through it, allowing some things to be zero weight initialized when symmetry breaking isn’t an issue.

›ReLU tiêu chuẩn nên được định nghĩa để cho phép gradient chảy ngược qua điểm 0.

#Mạng nơ-ron #Hàm kích hoạt #Tối ưu huấn luyện