News

Z.ai’s GLM-5.2 Beats GPT-5.5 at One-Sixth the API Price

Z.ai’s open-source GLM-5.2 coding model lands within a few points of Claude Opus 4.8 and beats GPT-5.5 on long-horizon benchmarks, at one-sixth the API cost.

Published

1 month ago

June 23, 2026

Henry Fox

GLM-5.2, a 744-billion-parameter open-source language model from Beijing-based Z.ai, launched on June 13, 2026. Vercel CEO Guillermo Rauch posted on X that he was “almost shocked” at how good GLM-5.2 is at coding. Matt Velloso, a former vice president of Meta, Google DeepMind, and Microsoft, said he used it for an entire working day and called it “the first open model that passes the bar as a daily driver.”

Z.ai released the model under an unrestricted MIT license and says it carries a 1 million token context window designed for long-running coding tasks. On coding benchmarks published by the company and by independent outlets, GLM-5.2 beats OpenAI’s GPT-5.5 on most long-horizon coding tests and trails Anthropic’s Claude Opus 4.8 within a few points.

Built for Long-Horizon Tasks

GLM-5.2 is a sparse mixture-of-experts model with 744 billion total parameters and roughly 40 billion active at any one time, per Z.ai’s open-source release. The 1 million token context window, up from the 200,000 tokens of the prior GLM-5, was designed to hold an entire mid-sized code repository and the long trajectories a coding agent leaves behind, according to Z.ai’s official launch blog post for GLM-5.2. Z.ai released GLM-5 in February 2026 and GLM-5.1 in April, per the company’s release notes, both aimed at long-horizon agentic tasks.

To make 1 million tokens affordable to run, Z.ai introduced a new attention shortcut called IndexShare, reusing a single indexer across every four sparse attention layers to cut per-token compute FLOPs by 2.9 times at the maximum context length. The model also picks up a faster multi-token prediction layer that increases the acceptance rate of speculative decoding by up to 20 percent. Training was tuned for the long, branched traces that coding agents produce. Z.ai’s “slime” framework handled the agentic reinforcement learning, and the company added a dedicated anti-hack module to keep coding agents from gaming verifiable test cases. The result, Z.ai says, is a model that can stay on task across large-scale implementation, automated research, performance optimization, and complex debugging.

A few specs at a glance:

744 billion total parameters (about 40 billion active at any one time)
1 million token context window, up from 200,000 in GLM-5
2.9x reduction in per-token compute FLOPs at 1M context
20% increase in multi-token prediction acceptance rate

Z.ai GLM-5.2 open-source Chinese AI coding model

Coding Benchmarks, Compared

Z.ai calls GLM-5.2 the highest-ranked open-source model across the long-horizon coding benchmarks it has tested. The biggest jump from its predecessor GLM-5.1 came on Terminal-Bench 2.1, where GLM-5.2 scored 81.0 versus GLM-5.1’s 62.0. On SWE-bench Pro, GLM-5.2 posted 62.1 against GLM-5.1’s 58.4.

GLM-5.2’s 81.0 on Terminal-Bench 2.1 lands within a few points of Claude Opus 4.8 (85.0) and GPT-5.5 (84.0), and ahead of Google’s Gemini 3.1 Pro (74.0). Z.ai’s blog also reports GLM-5.2 trails Opus 4.8 by 1 percent on FrontierSWE and edges out GPT-5.5 by 1 percent. The FrontierSWE benchmark measures whether an agent can finish open-ended technical projects at the scale of hours to tens of hours, spanning systems optimization, large-scale code construction, and applied ML research. There, GLM-5.2 hit 74.4 percent, beating GPT-5.5 (72.6 percent) and trailing Claude Opus 4.8 (75.1 percent) by under a point.

A snapshot of how the open model stacks up against the closed frontier on coding:

Benchmark	GLM-5.2	Claude Opus 4.8	GPT-5.5
Terminal-Bench 2.1	81.0	85.0	84.0
SWE-bench Pro	62.1	–	58.6
FrontierSWE	74.4%	75.1%	72.6%
MCP-Atlas	77.0	77.8	75.3
Humanity’s Last Exam (w/ tools)	54.7	57.9	52.2

On Terminal-Bench 2.1 specifically, GLM-5.2 lands within a few points of Opus 4.8 and ahead of Google’s Gemini 3.1 Pro, which posted 74.0 on the same test. GLM-5.2 ranks second only to the Opus series on the long-horizon tests Z.ai has run.

Silicon Valley Reacts

The reaction inside the developer community landed within hours. Rauch posted: “Genuinely impressed, almost shocked, at how good GLM-5.2 by @zai_org is at coding. This changes things.”

Velloso said on X that he ran the model for an entire day without missing the closed-source frontier. The Artificial Analysis evaluation team placed GLM-5.2 between GPT-5.5 and Claude Opus 4.8 on its new agentic knowledge-work benchmark, AA-Briefcase. Z.ai also pushed availability aggressively: the model is free via Hugging Face Inference Providers for a limited window, with local GGUF support via llama.cpp and Unsloth.

First open model that passes the bar as a daily driver. Things are not going to be the same.

The coding tools moved just as fast. Kilo Code confirmed day-one integration on X, with the team writing that the 1M context window and Max effort mode were both live in their tool. Cline IDE called the release “a game changer,” noting GLM-5.2 is the first open-weights model to cross 80 percent on Terminal-Bench. Jeremy Howard, the fast.ai co-founder, said the model was “at least as good as Opus 4.8 and GPT 5.5” for his use, with the caveat that it lacks vision support. Velloso’s full post on GLM-5.2 as a daily driver drew broad attention on X.

The Cost Gap

Z.ai charges $1.40 per million input tokens and $4.40 per million output tokens through its API, putting the combined cost at $5.80 per million tokens. VentureBeat’s pricing snapshot put GLM-5.2’s total cost at one-sixth that of GPT-5.5. OpenAI’s GPT-5.5 lists at $5.00 input and $30.00 output, for a combined $35.00; Anthropic’s Claude Opus 4.8 lists at $5.00 and $25.00, for $30.00.

For developers who prefer subscription billing, Z.ai launched the GLM Coding Plan in three tiers: Lite at $12.60 per month, Pro at $50.40, and Max at $112.00 per month. The Pro tier offers five times the usage allowance of Lite, and Max offers twenty times Lite plus dedicated peak-hour resources. The Lite tier is geared toward lightweight iteration on small repositories, the Pro tier toward mid-sized repositories, and the Max tier toward heavy workloads.

A few pricing reference points from the release:

$5.80 per million tokens: combined GLM-5.2 API cost
Combined Claude Opus 4.8 API cost: $30.00 per million tokens
Combined GPT-5.5 API cost: $35.00 per million tokens
$12.60 per month: GLM Coding Plan Lite tier
$112.00 per month: GLM Coding Plan Max tier

The combination of an MIT license and the price gap matters most for enterprises. A company can download the model weights from the open-source weights and benchmark code for GLM-5 on Hugging Face, run them on its own hardware, and remove the dependency on a US frontier lab’s roadmap, terms of service, and pricing. Z.ai’s managed API and Coding Plan options add paid paths for teams that want a hosted experience.

Anthropic Sounds the Alarm

Anthropic, one of the labs GLM-5.2 most directly challenges, has argued in a paper of its own that the US still holds a real but narrowing lead. The paper, “2028: Two scenarios for global AI leadership,” says American and allied AI labs could “lock in a 12-24 month lead in frontier capabilities” if policymakers tighten export controls and crack down on what Anthropic calls “distillation attacks,” the practice of using a stronger model’s outputs to train a smaller “student” model.

Anthropic’s 2028 paper is explicit about the mechanism. It credits bipartisan US export controls with slowing Chinese AI progress and accuses “AI labs in China” of “carrying out large-scale distillation attacks that harvest the innovations of US models in order to mimic their capabilities.” The paper warns that “the window of opportunity to lock in that lead will not necessarily remain open for long.”

The lab has separately accused three named Chinese AI labs of running “industrial-scale” distillation attacks on its Claude models, a charge the labs deny, the Financial Times reported. Z.ai was not among the labs named in that earlier report, the Latent Space AINews newsletter noted.

The week GLM-5.2 launched, the Trump Administration issued an export control directive prohibiting foreign nationals from using Anthropic’s Claude Fable 5 model, VentureBeat reported. Anthropic responded by taking the models offline for all users, according to the same report. Z.ai shipped its model the same week, with no regional limits on the weights, available for download on Hugging Face.

For enterprises watching the US-China contest, the timing sharpens Z.ai’s pitch: a frontier-grade open-weights model, downloadable, with no regional limits, available the same week one of the most capable closed US models is being fenced off. Anthropic’s own 2028 paper on US-China AI leadership scenarios names export controls and anti-distillation enforcement as the two levers the US still has.

An Echo of January 2025

The reception rhymes with January 2025, when DeepSeek released its R1 reasoning model. DeepSeek-R1 dropped on January 20, 2025, and by January 27 it had passed ChatGPT as the most downloaded free app on the US iOS App Store, per Wikipedia.

President Donald Trump called it “a wake-up call” for US industry, France24 reported at the time. AI infrastructure stocks sold off on the news. GLM-5.2 shipped under the same open license as its open-weight peers, with the same unsettled mood that DeepSeek’s R1 stirred.

Frequently Asked Questions

What is GLM-5.2?

GLM-5.2 is an open-source large language model released by Beijing-based Z.ai on June 13, 2026. The model has 744 billion total parameters with about 40 billion active at a time, supports a 1 million token context window, and ships with weights under an MIT license with no regional limits.

How does GLM-5.2 compare to GPT-5.5 and Claude Opus 4.8?

Z.ai’s published numbers and third-party benchmarks put GLM-5.2 within a few points of Claude Opus 4.8 on Terminal-Bench 2.1 (81.0 versus 85.0) and ahead of GPT-5.5 on long-horizon tests including FrontierSWE (74.4% versus 72.6%). Artificial Analysis’s AA-Briefcase ranked it between GPT-5.5 and Claude Opus 4.8 on agentic knowledge work.

Is GLM-5.2 free to use?

The model weights are free under an MIT license and can be downloaded from Hugging Face. Z.ai also sells API access at $1.40 per million input tokens and $4.40 per million output tokens, plus a GLM Coding Plan that starts at $12.60 per month for the Lite tier.

What does a 1 million token context window do?

A 1 million token context window can hold an entire mid-sized code repository and the working memory of a coding agent’s recent tool calls, Z.ai says. The 1 million token window is up from the 200,000 tokens of the prior GLM-5, and the company designed it to keep the model on task across multi-hour refactors and complex debugging sessions.

Why is GLM-5.2 being compared to DeepSeek’s R1?

Both are open-weight Chinese models that landed with a strong Silicon Valley reaction. DeepSeek’s R1 went public on January 20, 2025, and President Donald Trump called it “a wake-up call” for US industry. GLM-5.2, a coding-focused model, is the next Chinese open-weight release to draw the same kind of attention.