Tất cả bài viết

// Popular Articles

#microsoft

#1642025-05-17

Microsoft MEMENTO: LLMs that compress their own chain-of-thought

Microsoft Research teaches reasoning models to summarise their own thinking mid-generation — 2.5x less peak KV cache, ~2x throughput, and a surprising 'hidden channel' in the KV states that alone is worth 15 accuracy points on AIME24.

microsoftmementollm-reasoning

7 phút đọc