#3212025-08-04
Meta's REFRAG: 30× Faster RAG Decoding Without Losing Accuracy
Meta Superintelligence Labs just shipped REFRAG — a decoding framework that compresses RAG context into chunk embeddings, hitting 30.85× faster time-to-first-token, 16× longer context, and zero perplexity loss. No LLM retraining required.