#6182026-01-02
Xiaomi's MiMo-V2.5-ASR: 8B Open-Source Speech Model Beats Whisper by 23% — Speaks Cantonese, Wu, Hokkien, Sings Too
Xiaomi just open-sourced MiMo-V2.5-ASR, an 8B-parameter end-to-end speech recognition model that posts 5.73 average WER on the Open ASR Leaderboard — ahead of Whisper-large-v3 (7.44), Seed-ASR 2.0 (8.09), and on par with Qwen3-ASR-1.7B. Native support for Wu, Cantonese, Hokkien, Sichuanese; no-tag Chinese–English code-switching; and lyrics transcription that actually works. Weights + code on Hugging Face, GitHub, and ModelScope, Apache-2.0.