UPDATED 16 DECEMBER 2025 • 18:13 GMT

Zoom Achieves Record AI Assessment Score, Sparking Debate in Tech Community

Author: Terry Reed

Zoom Video Communications, renowned for its role in connecting remote workers during the pandemic, declared last week that it achieved the highest ever score on an AI assessment known as "Humanity's Last Exam." The announcement provoked widespread surprise, skepticism, and interest within the tech community.

The company, headquartered in San Jose, reported that its AI system achieved a score of 48.1 percent on this challenging benchmark, surpassing Google's Gemini 3 Pro, which previously held the record with a score of 45.8 percent. Xuedong Huang, Zoom's chief technology officer, stated in a blog post, "Zoom has achieved a new state-of-the-art result on the challenging Humanity's Last Exam full-set benchmark, scoring 48.1%, which represents a substantial 2.3% improvement over the previous SOTA result."

However, this leap raises critical questions: How did a video conferencing company with no established history in training large language models suddenly outpace industry giants like Google, OpenAI, and Anthropic on this benchmark, designed to assess the cutting-edge of machine intelligence? Perspectives vary; some view it as an impressive engineering feat, while others criticize it as a substantial misrepresentation of its capabilities.

Rather than developing its own comprehensive language model, Zoom utilized a "federated AI approach." This method routes queries to established models from companies like OpenAI, Google, and Anthropic, then employs proprietary software to evaluate, integrate, and enhance their outputs. At the core of this framework is the "Z-scorer," a mechanism that judges responses from various models to select the most optimal output for specific tasks. Huang explained that this approach orchestrates different models to improve reasoning through collaborative processes.

This differentiation is crucial in an industry where prestige and valuations hinge on the development of leading-edge models. Unlike major AI labs that invest hundreds of millions in training advanced systems, Zoom's accomplishment seems to stem from its effective integration of existing technologies.

Reactions from the AI community were immediately polarized. Max Rumpf, an AI engineer with experience in training language models, commented critically online. "Zoom strung together API calls to Gemini, GPT, Claude et al. and slightly improved on a benchmark that delivers no value for their customers," he noted, while acknowledging that the multi-model strategy is "actually quite smart and most applications should do this."

In contrast, some observers, like developer Hongcheng Zhu, offered a balanced view. He remarked, "To top an AI eval, you will most likely need model federation, like what Zoom did." This likens Zoom's strategy to common practices within competitive data science, where blending models frequently yields superior outcomes.

However, Rumpf highlighted a deeper concern — the potential misalignment between Zoom's endeavors and its customers' actual needs. He opined, "Retrieval over call transcripts is not 'solved' by SOTA LLMs," indicating that users might prioritize practical solutions over benchmark achievements.

Xuedong Huang, who joined Zoom from Microsoft in 1993, has a distinguished background in AI development. He founded Microsoft’s speech technology group and led advancements in speech recognition and natural language processing, validating the seriousness of Zoom's AI aspirations. In response to the recent achievement, he highlighted, "We have unlocked stronger capabilities in exploration, reasoning, and multi-model collaboration, surpassing the performance limits of any single model."

The benchmark, which exists to challenge AI systems rigorously, incorporates questions across various complex domains, requiring nuanced understanding and problem-solving abilities. Although a score of 48.1 percent might seem underwhelming outside of competitive contexts, it marks a significant improvement within the realm of machine performance metrics.

Zoom’s approach signifies a departure from the model-centric strategies of dominant players like OpenAI and Google. By creating an integration layer rather than focusing solely on building a leading model, Zoom aims to offer businesses the best available AI capabilities while mitigating risks associated with vendor lock-in.

Following the announcement, OpenAI revealed improvements with its GPT-5.2 model, further emphasizing the competitive dynamics at play, as Zoom has emerged as both a collaborator and a competitor within the AI landscape by leveraging the technologies developed by other firms.