Skip to content
Dashboard

MiMo M2.5

MiMo M2.5 is the mid-tier model in Xiaomi's MiMo v2.5 family, a Mixture-of-Experts (MoE) stack with reasoning, tool use, and multimodal input. It supports a context window of 1.1M tokens and 131.1K tokens max output tokens.

ReasoningTool UseImplicit CachingFile InputVision (Image)
index.ts
import { streamText } from 'ai'
const result = streamText({
model: 'xiaomi/mimo-v2.5',
prompt: 'Why is the sky blue?'
})

Playground

Try out MiMo M2.5 by Xiaomi. Usage is billed to your team at API rates. Free users (those who haven't made a payment) get $5 of credits every 30 days.

xiaomi logo
xiaomi logo

Ask MiMo M2.5 anything to try it out.

Providers

Route requests across multiple providers. Copy a provider slug to set your preference. Visit the docs for more info. Using a provider means you agree to their terms, listed under Legal.

Provider
Context
Latency
Throughput
Input
Output
Cache
Web Search
Per Query
Capabilities
ZDR
No Training
Release Date
Xiaomi
1.1M
2.4s
112tps
$0.14/M$0.28/M
Read:$0.0/M
Write:—
——
+3
04/22/2026
DeepInfra
262K
0.6s
17tps
$0.40/M$2.00/M
Read:$0.08/M
Write:—
——
+3
04/22/2026
Throughput

P50 throughput on live AI Gateway traffic, in tokens per second (TPS). Visit the docs for more info.

Latency

P50 time to first token (TTFT) on live AI Gateway traffic, in milliseconds. View the docs for more info.

Uptime

Direct request success rate on AI Gateway and per-provider. Visit the docs for more info.

More models by Xiaomi

Model
Context
Latency
Throughput
Input
Output
Cache
Web Search
Per Query
Capabilities
Providers
ZDR
No Training
Release Date
1.1M
0.4s
57tps
$0.43/M$0.87/M
Read:$0.0/M
Write:—
——
+1
deepinfra logo
xiaomi logo
04/22/2026
1M
1.9s
59tps
$1.00/M
$3.00/M
Read:
$0.2/M
Write:
—
——
+1
xiaomi logo
03/18/2026
262K
1.6s
107tps
$0.10/M$0.30/M
Read:$0.01/M
Write:—
——
+1
xiaomi logo
12/16/2025

About MiMo M2.5

MiMo M2.5 is a MoE language model from Xiaomi, released April 22, 2026 under the MIT license. Each forward pass activates a subset of total parameters, which keeps per-token compute lower than a dense model at the same parameter count.

The architecture uses hybrid attention, interleaving sliding-window and full attention to cut KV-cache storage at long sequence lengths. A multi-token prediction (MTP) block raises output tokens per step during inference. The full window of 1.1M tokens lets MiMo M2.5 reason over large documents, repos, or long agent trajectories.

MiMo M2.5 supports reasoning, tool calling, file input, vision, and implicit prompt caching. Call it through xiaomi, deepinfra via AI Gateway. For the higher-capability tier, see mimo-v2.5-pro.

What To Consider When Choosing a Provider

  • Configuration: MiMo M2.5 balances cost, capability, and context length. The MoE design keeps active compute small, but routing and serving a 300B-class MoE still requires capable infrastructure on the provider side. Use AI Gateway's cost tracking and model fallback to mix MiMo M2.5 with mimo-v2.5-pro on harder workloads.
  • Zero Data Retention: AI Gateway supports Zero Data Retention for this model via direct gateway requests (BYOK is not included). To configure this, check the documentation.
  • Authentication: AI Gateway authenticates requests using an API key or OIDC token. You do not need to manage provider credentials directly.

When to Use MiMo M2.5

Best For

  • Agentic Workflows: Tool-using agents that string together many calls in one session
  • Software Engineering: Code generation, refactors, and repo-scale analysis with a window of 1.1M tokens
  • Multimodal Input: Reasoning over mixed text, images, and uploaded files
  • Long Context: Documents, codebases, or chat histories that approach 1.1M tokens
  • Cost-Aware Reasoning: Lower active-parameter compute than dense models at similar scores

Consider Alternatives When

  • Maximum Reasoning Depth: mimo-v2.5-pro activates more parameters per step on the hardest math and engineering tasks
  • Speed-First Throughput: mimo-v2-flash is the throughput-tuned option in the previous generation
  • Simple Classification: A smaller, cheaper model handles short extraction at lower cost
  • Strict Text Pipelines: A text-only model is fine when your inputs are never images or files

Conclusion

MiMo M2.5 is the standard tier of Xiaomi's MiMo v2.5 family. Use it for agentic workflows, multimodal input, code, and long-context analysis. Pair it with mimo-v2.5-pro through AI Gateway routing so harder jobs land on the higher tier.