Chinese company Z.ai has released the open-weight GLM-5.2 model, which produced results comparable to some Anthropic Claude configurations in two separate studies of vulnerability discovery and incident investigation. The findings come from evaluations by Graphistry and Semgrep. They apply to specific task sets and do not establish overall parity between the models.

What the Graphistry test found

On June 23, Graphistry reported that OpenCode with GLM-5.2 solved 28 of 59 tasks in the private CyBT-CTF benchmark. Claude Opus 4.7 and 4.8 achieved the same result in comparable configurations. The researchers estimated that Claude completed the work 19% faster but cost more than 2.2 times as much for the same number of solved tasks.

However, the strongest Louie and Opus configuration solved 35 of 59 tasks, compared with 28 for OpenCode and GLM-5.2. The authors stressed that the agent harness, tools and prompting setup may have a larger effect on the outcome than the choice between these two models. They present the result as a starting point, not a universal ranking.

A separate Semgrep evaluation

Semgrep separately evaluated how well models could find insecure direct object reference, or IDOR, vulnerabilities caused by insufficient access-control checks. GLM-5.2, running with a minimal harness, reached a 39% F1 score and ranked above the Claude Code configurations included in the test.

Semgrep’s purpose-built Multimodal system performed better, reaching 61% with GPT-5.5 and 53% with Opus 4.8. The researchers explicitly limited their conclusion: this was one task, one dataset and one run. They said the order could change when models are tested against another vulnerability class.

What is known about GLM-5.2

According to Z.ai’s official announcement, GLM-5.2 was introduced on June 16, 2026 for long-horizon tasks, coding and work with a context window of up to one million tokens. Its Hugging Face model card lists 753 billion parameters and an MIT license.

Publishing the weights allows users to download the model, run it on their own infrastructure and adapt it for particular tasks. Open weights do not mean that the developer has disclosed the training data and the full development pipeline. Cifrum.kz previously published a guide to running models locally with Ollama, which explains the practical side of this approach.

Why overall parity with Mythos has not been established

The original Perplexity overview describes the result as comparable to Anthropic’s Mythos. However, the numerical comparisons published by Graphistry and Semgrep primarily involve Claude Opus 4.7 and 4.8. Mythos does not appear as a separate, directly comparable model in the reported tables.

The narrower and supportable conclusion is that GLM-5.2 reached Claude Opus-level performance in several specific cybersecurity scenarios. These tests do not measure general reasoning across all domains, reliability in every kind of coding task or the ability to handle any security problem.

Why the result is drawing attention

Axios notes that stronger open-weight models can lower the cost of tools for defenders and enable local use without provider oversight. That creates new options for security teams while also raising concerns about potential misuse.

The news comes amid a broader debate over access to frontier systems. Cifrum.kz previously reported on the Anthropic CEO’s call for G7 countries to coordinate AI policy. The GLM-5.2 findings show that model evaluations increasingly depend not only on where a system was developed or how it is licensed, but also on the task, methodology and agent environment.

Sources: Z.ai, Hugging Face, Graphistry, Semgrep, Axios.

The image was generated with artificial intelligence for Cifrum.kz and is illustrative. It does not show a real interface or test results.

China’s GLM-5.2 matches Claude Opus in selected cybersecurity tests

What the Graphistry test found

A separate Semgrep evaluation

What is known about GLM-5.2

Why overall parity with Mythos has not been established

Why the result is drawing attention

Comments on this article

Leave a Comment Cancel Reply

What the Graphistry test found

A separate Semgrep evaluation

What is known about GLM-5.2

Why overall parity with Mythos has not been established

Why the result is drawing attention

Comments on this article

Leave a Comment Cancel Reply

Reports: Anthropic CEO urges G7 leaders not to fragment AI policy

Historic Milestone: Bots Surpass Humans in Web Traffic for the First Time

How to install OpenCV in Python and build a smart camera

Claude Tag in Slack: How to Set Up an AI Teammate