Skip to content
Dashboard

Qwen 3.7 Max

Qwen 3.7 Max is Alibaba's flagship agent-tuned model in the Qwen 3.7 line, with a context window of 991K tokens and an emphasis on long-horizon tool use, multi-file coding, and office workflow automation.

Implicit CachingReasoningTool Use
index.ts
import { streamText } from 'ai'
const result = streamText({
model: 'alibaba/qwen3.7-max',
prompt: 'Why is the sky blue?'
})

Playground

Try out Qwen 3.7 Max by Alibaba. Usage is billed to your team at API rates. Free users (those who haven't made a payment) get $5 of credits every 30 days.

alibaba logo
alibaba logo

Ask Qwen 3.7 Max anything to try it out.

Providers

Route requests across multiple providers. Copy a provider slug to set your preference. Visit the docs for more info. Using a provider means you agree to their terms, listed under Legal.

Provider
Context
Latency
Throughput
Input
Output
Cache
Web Search
Per Query
Capabilities
ZDR
No Training
Release Date
Alibaba
991K
1.9s
57tps
$1.25/M$3.75/M
Read:$0.25/M
Write:$1.56/M
——
+1
05/21/2026
Throughput

P50 throughput on live AI Gateway traffic, in tokens per second (TPS). Visit the docs for more info.

Latency

P50 time to first token (TTFT) on live AI Gateway traffic, in milliseconds. View the docs for more info.

Uptime

Direct request success rate on AI Gateway and per-provider. Visit the docs for more info.

More models by Alibaba

Model
Context
Latency
Throughput
Input
Output
Cache
Web Search
Per Query
Capabilities
Providers
ZDR
No Training
Release Date
1M
1.3s
303tps
$0.32/M$1.28/M
Read:$0.08/M
Write:$0.5/M
——
+3
alibaba logo
fireworks logo
togetherai logo
06/02/2026
240K
1.3s
84tps
$1.30/M
$7.80/M
Read:
$0.26/M
Write:
$1.63/M
——
+1
alibaba logo
04/20/2026
1M
0.2s
118tps
$0.50/M
$3.00/M
Read:
$0.1/M
Write:
$0.63/M
——
+3
alibaba logo
fireworks logo
togetherai logo
04/02/2026
1M
0.9s
144tps
$0.10/M$0.40/M
Read:$0.0/M
Write:$0.13/M
——
+3
alibaba logo
02/24/2026
33K
$0.05/M——
deepinfra logo
06/05/2025
41K
0.3s
63tps
$0.12/M$0.24/M——
deepinfra logo
04/28/2025

About Qwen 3.7 Max

Qwen 3.7 Max is the Max-tier release in the Qwen 3.7 generation, succeeding Qwen3.6-Max-Preview in Alibaba's closed-weight API line. The model is served through alibaba with a context window of 991K tokens, which suits full-repository ingestion, long agent traces, and multi-document analysis without segmentation.

Alibaba describes Qwen 3.7 Max as designed as an agent foundation. The model targets coding agents that plan and act across many turns, office and productivity tasks that route work through multi-agent orchestration, and long-horizon autonomous execution where the model must maintain coherent reasoning across hundreds of sequential tool calls. Reported improvements over Qwen3.6-Max-Preview concentrate in frontend prototyping and complex multi-file engineering work.

Like other Max-tier entries, Qwen 3.7 Max supports tool calling and structured outputs, with extended-thinking mode available for high-difficulty reasoning, scientific computation, and expert-level queries. The thinking budget can be tuned per request to balance depth of reasoning against latency and token spend. Qwen 3.7 Max is text-only; for vision input, the sibling Qwen3.7-Plus is the multimodal entry in the 3.7 lineup.

You can integrate Qwen 3.7 Max through AI SDK, Chat Completions API, Responses API, Messages API, or other API formats, from TypeScript or Python.

What To Consider When Choosing a Provider

  • Configuration: Agent workflows that chain hundreds of tool calls produce high output-token volume. Use the AI Gateway cost dashboard to monitor per-session spend and tune the thinking budget before running production traffic at scale.
  • Zero Data Retention: AI Gateway does not currently support Zero Data Retention for this model. See the documentation for models that support ZDR.
  • Authentication: AI Gateway authenticates requests using an API key or OIDC token. You do not need to manage provider credentials directly.

When to Use Qwen 3.7 Max

Best For

  • Long-Horizon Coding Agents: Sustained tool-calling sessions across many turns with planning, retries, and dead-end recovery
  • Multi-File Software Engineering: Refactoring, diff editing, and frontend prototyping across a repository
  • Office Workflow Automation: Routing productivity tasks through multi-agent orchestration
  • Expert Reasoning Tasks: Scientific computation, mathematics, and structured analysis with extended-thinking mode
  • Repository Ingestion: Long-context workloads using the window of 991K tokens for full codebases and tool traces

Consider Alternatives When

  • Vision Or Multimodal Input: Qwen3.7-Plus is the multimodal entry in the 3.7 line when image inputs are needed
  • Latency-Sensitive Pipelines: A Plus or Flash-tier model serves users better when extended-thinking traces add unnecessary overhead
  • Strict Token Budgets: A smaller model is a closer fit when per-session spend on a flagship Max model isn't justified
  • Built-In Autonomous Search: Qwen3-Max-Thinking is a stronger fit when integrated search and code interpreter tools are required

Conclusion

Qwen 3.7 Max extends the Qwen Max tier with an agent-first design that targets long-horizon tool use, multi-file coding, and office workflow automation. Routing through AI Gateway gives you a single integration surface, provider failover, and consolidated billing while you build against the latest generation in the Max line.