// Popular Articles

#acl-2025
#772025-04-04

Native Sparse Attention: the ACL 2025 Best Paper that makes 64k context 11.6× cheaper

DeepSeek + Peking University win ACL 2025 Best Paper with NSA — a sparse attention mechanism trained from scratch. 27B model beats dense baseline, runs 9× faster forward, 11.6× faster decoding at 64k on A100.

native-sparse-attentiondeepseekacl-2025
7 phút đọc