#772025-04-04
Native Sparse Attention: the ACL 2025 Best Paper that makes 64k context 11.6× cheaper
DeepSeek + Peking University win ACL 2025 Best Paper with NSA — a sparse attention mechanism trained from scratch. 27B model beats dense baseline, runs 9× faster forward, 11.6× faster decoding at 64k on A100.