Skip to content
Dashboard

GLM 4.7 FlashX

GLM 4.7 FlashX is the ultra-fast inference variant in Z.ai's GLM-4.7 generation, released January 19, 2026. Designed for the lowest latency workloads, it provides the fastest response times in the GLM-4.7 family while retaining core coding and reasoning capabilities.

ReasoningTool UseImplicit Caching
index.ts
import { streamText } from 'ai'
const result = streamText({
model: 'zai/glm-4.7-flashx',
prompt: 'Why is the sky blue?'
})

More models by Z.ai

Model
Context
Latency
Throughput
Input
Output
Cache
Web Search
Per Query
Capabilities
Providers
ZDR
No Training
Release Date
1M
2.2s
184tps
$3.00/M$10.25/M
Read:$0.5/M
Write:
+1
wafer logo
06/23/2026
1M
0.8s
200tps
$1.40/M$4.40/M
Read:$0.26/M
Write:
+1
baseten logo
fireworks logo
wafer logo
+1
06/16/2026
205K
0.5s
174tps
$1.30/M$4.30/M
Read:$0.26/M
Write:
+1
baseten logo
deepinfra logo
fireworks logo
+3
04/07/2026
203K
4.3s
45tps
$1.20/M$4.00/M
Read:$0.24/M
Write:
+1
zai logo
03/15/2026
203K
0.5s
81tps
$0.80/M$2.56/M
Read:$0.16/M
Write:
+1
baseten logo
bedrock logo
deepinfra logo
+3
02/12/2026
205K
0.1s
525tps
$2.25/M$2.75/M
Read:$2.25/M
Write:
+1
baseten logo
bedrock logo
cerebras logo
+3
12/22/2025