Menlo Park, United States. Meta artificial intelligence chief Alexandr Wang told employees that a model under development, codenamed Watermelon, had matched OpenAI’s GPT-5.5 on a set of industry benchmarks. Business Insider reported the remarks, citing two people familiar with the internal town hall.
The claim cannot yet be treated as a verified model comparison. Watermelon remains in training, Meta has not named the benchmarks, released scores or provided the system for independent testing. The company has issued no official public announcement about Watermelon.
What Alexandr Wang reportedly said
Wang reportedly described Watermelon as the model following Avocado, the internal codename associated with Muse Spark. He also said Watermelon was using “an order of magnitude” more compute than the earlier project. In ordinary numerical usage, that suggests an increase of roughly ten times.
Business Insider did not specify what form of compute was being compared: total operations, accelerator count, training duration or a product of those measures. Without a methodology, the figure cannot be directly converted into model size, development cost or expected quality.
Meta and OpenAI did not provide public comments on the comparison. The most accurate description for now is that Meta leadership reported an internal milestone that remains to be verified.
Why matching benchmarks does not establish full parity
A score depends on task selection, dataset version, reasoning settings, the number of attempts and whether cost and response speed are counted. Two models can be close in mathematics while differing substantially in coding, tool use, multimodality or long-horizon reliability.
Watermelon’s training status adds another uncertainty. The final system can change after further training, behavioural tuning and safety evaluation. Until a technical report appears, it is not known whether the comparison used an intermediate checkpoint or a near-release configuration.
Cifrum.kz has previously explained why matching an AI rival on selected cybersecurity tests cannot be generalized to every capability. The same limit applies to the Watermelon claim.

What is known about GPT-5.5
OpenAI officially released GPT-5.5 on 23 April 2026. It published results covering agentic coding, computer use, professional tasks, scientific evaluations and cybersecurity. GPT-5.5 became available through ChatGPT, Codex and the API.
OpenAI’s published figures are also developer-reported and benefit from outside replication. However, researchers have benchmark names, evaluation conditions and access to the model. No comparable evidence package is available for Watermelon.
GPT-5.6 has already moved the target
On 26 June, OpenAI began a limited preview of the GPT-5.6 series. Its flagship Sol model is reported by the company to improve on GPT-5.5 across several agentic, biology and cybersecurity tasks. Broader availability is planned later, while the initial preview is restricted to a small set of partners.
That makes GPT-5.5 a clear but no longer newest OpenAI reference point. Even if Meta’s internal parity result is reproducible, it places Watermelon relative to the April model rather than the competitor’s entire current lineup.
From Muse Spark to Watermelon
Meta introduced Muse Spark on 8 April as the first model in a new series from Meta Superintelligence Labs. The company describes it as a compact, fast system for complex reasoning and multimodal tasks. Muse Spark powers Meta AI, and a larger next generation was officially in development at the time of the announcement.
The link between the Avocado codename and Muse Spark, as well as the Watermelon name, comes from media reports about internal projects. Meta’s official pages do not yet give the next generation a public name, release date or model card.
What ten times more compute could mean
More compute can support a larger model, more training data or a longer training run. Scaling does not guarantee a proportional quality gain. The outcome also depends on architecture, data, optimization algorithms and post-training.
Meta’s own explanation of its computing infrastructure describes model training and serving as a combination of GPUs, custom chips, networks and data centers. A “compute” figure without a unit is therefore a scale indicator rather than a technical specification.
Meta expects up to $145 billion in capital expenditure
In its first-quarter results, Meta raised expected 2026 capital expenditure to $125–145 billion. It attributed the revision to higher component prices and additional data-center costs needed for future capacity.
The range includes principal payments on finance leases and covers company infrastructure broadly. It should not be described as Watermelon’s budget or assigned entirely to generative AI.
What would verify Meta’s claim
- benchmark names, dataset versions and complete run conditions;
- results for multiple models under the same compute budget;
- cost, latency and number of attempts used to produce each score;
- a system card describing limitations and safety evaluations;
- access for independent researchers or a public API.
Until those materials appear, Watermelon is best described as a promising but unverified model. The internal result may be an important milestone for Meta, but it does not yet establish a new balance of power in the market.
The model race also involves safety, effects on users and interpretation of system behaviour. Cifrum.kz separately examined why technology companies are studying possible AI consciousness without claiming it has been detected.
Sources: the Business Insider report as carried by Investing.com, Meta’s Muse Spark announcement, Meta’s first-quarter results, the GPT-5.5 announcement and the GPT-5.6 preview.
The lead image was created with artificial intelligence for Cifrum.kz as a conceptual editorial illustration. It does not depict actual Meta or OpenAI servers or verify benchmark results. The infographic was produced by Cifrum.kz.

Comments on this article