Skip links
Xiaomi

“Best Value for Money” Comes to LLMs Too: Xiaomi Launches MiMo-V2-Flash — and It’s Genuinely Fast

While Western “frontier” models compete on sheer scale and agent complexity, Xiaomi has chosen a different, far more practical game: speed, long context, and pricing that makes developers raise an eyebrow. MiMo-V2-Flash is a MoE model with 309B parameters, but only 15B are active at inference time. It is optimized for agentic workflows and long sessions, and its technical report claims support for context lengths of up to 256k tokens.

The most interesting part is the results and the positioning. In its own benchmark table, Xiaomi reports 73.4% on SWE-Bench Verified (fixing real bugs from open-source projects), where MiMo-V2-Flash slightly outperforms DeepSeek-V3.2 Thinking and sits close to top closed models. On AIME 2025 and GPQA-Diamond, the model also remains in the “top league.” This should be read soberly: in the same table, Claude Sonnet 4.5 still scores higher on SWE-Bench Verified. But the very fact that open-weights models have moved to within touching distance of the frontier in applied coding is already a noteworthy event.

See also  NVIDIA: CHAMPIONING PROGRESS JUNE NEWS FROM THE GLOBAL COMPANY

And now the part that made this spread across feeds. Xiaomi lists API pricing at $0.10 per million input tokens and $0.30 per million output tokens. Compared to the official $3 / $15 pricing of Claude Sonnet 4.5, this is roughly 30× cheaper on input and 50× cheaper on output. If these numbers hold in production (and not just at launch), 2026 could become the year when the winner is not “the smartest at any cost,” but “smart enough to be everywhere.”

One more detail in the context of the Chinese “neural network wars”: the public face of the MiMo direction is said to be Luo Fuli, whose move and role were actively discussed in connection with DeepSeek. Xiaomi itself is already talking about a “human–car–home” strategy and about integrating the model into smartphones, vehicles, and its broader AIoT ecosystem.

This website uses cookies to improve your web experience.
Explore
Drag