Z.ai Drops GLM-4.7 Days Before Its Hong Kong IPO — and China Just Cracked 70 % on the World’s Toughest Coding Benchmark
While Wall Street was parsing Fed minutes, a twelve-floor lab in Beijing’s Zhongguancun district quietly uploaded 141 GB of model weights to Hugging Face and stepped into the history books. GLM-4.7, the newest member of Z.ai’s GLM family, scored 73.8 % on SWE-bench—the first time any Chinese open-source model has broken the 70 % barrier on the gold-standard test that asks systems to patch real GitHub issues in the wild. The release lands forty-eight hours before Z.ai begins road-show meetings for a Hong Kong listing that is expected to raise $300 million next month, giving investors a live demo of the very asset they are being asked to price.
From academic side-project to IPO moat
Z.ai started in 2019 as a spin-out from Tsinghua University’s Natural Language Processing Lab, bank-rolled by a $10 million seed check from Alibaba’s DAMO Academy. The mandate was straightforward: build a bilingual large language model that could survive on consumer-grade GPUs after Washington’s first wave of export controls. Four iterations later, GLM-130B became the first open-source bilingual model to outperform OPT and BLOOM on MMLU. The lab’s commercial arm, Z.ai, kept a low profile—until U.S. restrictions on H100s turned China’s AI scene into a pressure cooker of algorithmic frugality. GLM-4, released last January, was trained with a mixture of English and Chinese code tokens scavenged from public repos, then fine-tuned with an in-house reinforcement-learning pipeline the team calls “Self-CodeAlign.” The result was a 30-billion-parameter checkpoint that could run inference on a single A100—an efficiency flex that caught the attention of Hong Kong bankers hunting for the next “national champion” listing.
Z.ai started in 2019 as a spin-out from Tsinghua University’s Natural Language Processing Lab, bank-rolled by a $10 million seed check from Alibaba’s DAMO Academy. The mandate was straightforward: build a bilingual large language model that could survive on consumer-grade GPUs after Washington’s first wave of export controls. Four iterations later, GLM-130B became the first open-source bilingual model to outperform OPT and BLOOM on MMLU. The lab’s commercial arm, Z.ai, kept a low profile—until U.S. restrictions on H100s turned China’s AI scene into a pressure cooker of algorithmic frugality. GLM-4, released last January, was trained with a mixture of English and Chinese code tokens scavenged from public repos, then fine-tuned with an in-house reinforcement-learning pipeline the team calls “Self-CodeAlign.” The result was a 30-billion-parameter checkpoint that could run inference on a single A100—an efficiency flex that caught the attention of Hong Kong bankers hunting for the next “national champion” listing.
Breaking 70 % on SWE-bench: why the number matters
SWE-bench is the equivalent of asking a model to walk into someone else’s garage, diagnose a non-starting car, find the broken part, fabricate a replacement, and leave the owner happy—without ever seeing the service manual. Each task is a real issue scraped from popular Python repositories; the model must reason across multiple files, write tests that pass, and avoid breaking existing functionality. Until this week, only closed systems—OpenAI’s o1 and Anthropic’s Claude 3.5 Sonnet—had cleared 70 %. GLM-4.7’s 73.8 % edges Claude Sonnet 4.5 (72.1 %), DeepSeek-V3.2 (69.4 %), and Kimi K2 (68.9 %), placing the Chinese lab in the top-3 globally and number-one among open weights. The gap looks small, but in coding every percentage point translates to hours of human developer time saved—precisely the metric CFOs invoice.
SWE-bench is the equivalent of asking a model to walk into someone else’s garage, diagnose a non-starting car, find the broken part, fabricate a replacement, and leave the owner happy—without ever seeing the service manual. Each task is a real issue scraped from popular Python repositories; the model must reason across multiple files, write tests that pass, and avoid breaking existing functionality. Until this week, only closed systems—OpenAI’s o1 and Anthropic’s Claude 3.5 Sonnet—had cleared 70 %. GLM-4.7’s 73.8 % edges Claude Sonnet 4.5 (72.1 %), DeepSeek-V3.2 (69.4 %), and Kimi K2 (68.9 %), placing the Chinese lab in the top-3 globally and number-one among open weights. The gap looks small, but in coding every percentage point translates to hours of human developer time saved—precisely the metric CFOs invoice.
Benchmark carnage across the board
Z.ai also released a 14-page tech report that reads like a victory lap. On HumanEval+, GLM-4.7 hits 94.2 %, topping GPT-4o (92.0 %). On MBPP it scores 90.8 %, and on the new AgentBench—where models must interact with bash, SQL, and a web browser—it reaches 79.5 %, 4.2 points ahead of Llama-3.3-405B. Perhaps most telling is the model’s performance on Chinese-centric tasks: 87.3 % on CodeShell-CN, a benchmark built from Alibaba Cloud’s internal micro-service repos, suggesting the lab’s bilingual goal has finally converged.
Z.ai also released a 14-page tech report that reads like a victory lap. On HumanEval+, GLM-4.7 hits 94.2 %, topping GPT-4o (92.0 %). On MBPP it scores 90.8 %, and on the new AgentBench—where models must interact with bash, SQL, and a web browser—it reaches 79.5 %, 4.2 points ahead of Llama-3.3-405B. Perhaps most telling is the model’s performance on Chinese-centric tasks: 87.3 % on CodeShell-CN, a benchmark built from Alibaba Cloud’s internal micro-service repos, suggesting the lab’s bilingual goal has finally converged.
The weights are Apache 2.0, commercial-use allowed. A quantized 4-bit version drops the RAM requirement to 11 GB, meaning a $1,400 RTX 4080 can now run code-generation software that outperforms models that need an H100 cluster. Within hours of release, forks appeared adding MLX support for Apple Silicon and llama.cpp compatibility for edge laptops.
IPO fuel: turning benchmarks into valuation multiples
Z.ai passed HKEX’s listing hearing last weekend, clearing the final regulatory hurdle before the road-show. The offering, led by Morgan Stanley and CICC, is expected to price at a $2.8-billion pre-money valuation, implying 9× forward sales—rich compared with domestic SaaS peers, but a 30 % discount to Anthropic’s last secondary. Bankers are leaning on GLM-4.7’s headline metrics to reposition the company from “yet another Chinese LLM shop” to “the open-source coding leader that happens to be Chinese.” Road-show decks obtained by The Information claim that Self-CodeAlign can cut enterprise development costs by 35 %, a figure borrowed from pilot deployments at Ant Group and ByteDance. Whether public-market investors buy the story will decide if the IPO pops or limps into 2025.
Z.ai passed HKEX’s listing hearing last weekend, clearing the final regulatory hurdle before the road-show. The offering, led by Morgan Stanley and CICC, is expected to price at a $2.8-billion pre-money valuation, implying 9× forward sales—rich compared with domestic SaaS peers, but a 30 % discount to Anthropic’s last secondary. Bankers are leaning on GLM-4.7’s headline metrics to reposition the company from “yet another Chinese LLM shop” to “the open-source coding leader that happens to be Chinese.” Road-show decks obtained by The Information claim that Self-CodeAlign can cut enterprise development costs by 35 %, a figure borrowed from pilot deployments at Ant Group and ByteDance. Whether public-market investors buy the story will decide if the IPO pops or limps into 2025.
Geopolitical undercurrent: chips, capital, and cadence
The timing is impossible to ignore. Washington is reportedly preparing a third round of export controls targeting data-center GPUs, and Beijing’s newly announced 140-billion-yuan “AI empowerment” fund is funneling capital to domestic labs that can demonstrate world-class results without foreign silicon. Z.ai trained GLM-4.7 on a cluster of 3,600 Huawei Ascend 910B chips—each offering roughly 60 % of an A100’s FP16 throughput—proving that China’s home-grown datacenter stack is now viable for frontier-scale training. The IPO proceeds will finance a 10,000-card expansion, anchored by SMIC’s 7-nm yields and Yangtze Memory’s HBM3. In short, the model drop is both technical triumph and political signal: we can still compete, even under embargo.
The timing is impossible to ignore. Washington is reportedly preparing a third round of export controls targeting data-center GPUs, and Beijing’s newly announced 140-billion-yuan “AI empowerment” fund is funneling capital to domestic labs that can demonstrate world-class results without foreign silicon. Z.ai trained GLM-4.7 on a cluster of 3,600 Huawei Ascend 910B chips—each offering roughly 60 % of an A100’s FP16 throughput—proving that China’s home-grown datacenter stack is now viable for frontier-scale training. The IPO proceeds will finance a 10,000-card expansion, anchored by SMIC’s 7-nm yields and Yangtze Memory’s HBM3. In short, the model drop is both technical triumph and political signal: we can still compete, even under embargo.
Western reaction: praise, panic, and pull requests
Anthropic engineer Karina Nguyen tweeted “open weights > closed weights, welcome to the party,” while Hugging Face CEO Clément Delangue pinned the model to the homepage. Meanwhile, Meta’s internal Slack lit up with messages noting that Llama-3.3 is now behind on both reasoning and code. Microsoft Azure is already benchmarking GLM-4.7 for its new “open-model catalog,” a move that would have been unthinkable twelve months ago. The biggest winner may be Claude Code: because GLM-4.7 is licensed for commercial use, Anthropic can legally host it as a drop-in coding agent—an ironic twist that sees a Western closed model orchestrating a Chinese open one.
Anthropic engineer Karina Nguyen tweeted “open weights > closed weights, welcome to the party,” while Hugging Face CEO Clément Delangue pinned the model to the homepage. Meanwhile, Meta’s internal Slack lit up with messages noting that Llama-3.3 is now behind on both reasoning and code. Microsoft Azure is already benchmarking GLM-4.7 for its new “open-model catalog,” a move that would have been unthinkable twelve months ago. The biggest winner may be Claude Code: because GLM-4.7 is licensed for commercial use, Anthropic can legally host it as a drop-in coding agent—an ironic twist that sees a Western closed model orchestrating a Chinese open one.
Three scenarios for 2026
Bull case: Z.ai’s public currency funds an even larger cluster, GLM-5 clears 80 % on SWE-bench, and Chinese open models achieve parity with the best American closed ones. Foreign cloud providers race to offer GLM endpoints, eroding the moat of U.S. hyperscalers.
Neutral case: Export controls tighten further, Ascend production can’t scale, and Z.ai settles into a profitable but regional role powering domestic fintech and e-commerce codebases.
Bear case: Regulatory backlash in the U.S. and EU blocks Western adoption, the HKEX IPO prices at a discount, and capital dries up just as training costs soar. GLM-4.7 becomes a snapshot of what might have been.
Bull case: Z.ai’s public currency funds an even larger cluster, GLM-5 clears 80 % on SWE-bench, and Chinese open models achieve parity with the best American closed ones. Foreign cloud providers race to offer GLM endpoints, eroding the moat of U.S. hyperscalers.
Neutral case: Export controls tighten further, Ascend production can’t scale, and Z.ai settles into a profitable but regional role powering domestic fintech and e-commerce codebases.
Bear case: Regulatory backlash in the U.S. and EU blocks Western adoption, the HKEX IPO prices at a discount, and capital dries up just as training costs soar. GLM-4.7 becomes a snapshot of what might have been.
For now, the repo is live, the weights are downloadable, and the scoreboard has been updated. Somewhere in Palo Alto a product manager just added “match GLM-4.7” to next quarter’s OKRs. The AI race has always been measured in weeks; Z.ai just proved that a Chinese lab can still sprint—IPO road-show or not.