AI’s hidden weakness: Leading models struggle to collaborate effectively

The results revealed a startling phenomenon - the “collaboration gap.” Models that solved the maze easily on their own often failed when paired with an identical copy. In many cases, smaller or distilled versions of large models almost completely collapsed under collaborative conditions. This finding challenges assumptions that communication and coordination naturally emerge from intelligence.

CO-EDP, VisionRI | Updated: 07-11-2025 23:52 IST | Created: 07-11-2025 23:52 IST

AI’s hidden weakness: Leading models struggle to collaborate effectively — Representative Image. Credit: ChatGPT

A team of researchers from EPFL and Microsoft Research has discovered a striking weakness in today's most advanced artificial intelligence systems. The study, titled "The Collaboration Gap," reveals that even the most capable large language models (LLMs) struggle to work together effectively, a flaw that could stall the safe and scalable deployment of multi-agent AI systems.

The research evaluated 32 leading open- and closed-source AI models in both solo and collaborative tasks, introducing a new benchmark to measure how well agents communicate, align, and cooperate when solving problems under partial information. Despite remarkable individual reasoning ability, many models showed severe performance drops when required to collaborate with a copy of themselves.

When strong AI systems fail at teamwork

The authors designed a maze-solving benchmark to test how pairs of models interact when each holds only half of the information needed to find a solution. The experiment simulated real-world teamwork: each AI agent saw a different part of the maze and had to discuss moves turn by turn until they agreed on a path. A third model acted as an automated grader to assess whether the pair reached the goal successfully.

The results revealed a startling phenomenon - the "collaboration gap." Models that solved the maze easily on their own often failed when paired with an identical copy. In many cases, smaller or distilled versions of large models almost completely collapsed under collaborative conditions. This finding challenges assumptions that communication and coordination naturally emerge from intelligence.

According to the study, performance depended heavily on interaction order, which model initiated the conversation. When a stronger agent started the exchange, results improved sharply. Conversely, letting a weaker model lead often led to confusion and misalignment.

Relay inference: A new way to bridge the gap

To address the breakdown, the researchers proposed a strategy called "relay inference." The method allows a stronger model to "seed" the initial reasoning steps before handing over control to a weaker partner. This minimal priming produced substantial performance gains in both homogeneous and mixed pairings.

The team found that even a single early intervention from a capable model could boost collaboration outcomes. However, the benefit declined when weaker models dominated the early stages of communication, suggesting that first impressions in AI dialogue, much like in human interaction, set the tone for success or failure.

In experiments with heterogeneous pairs, such as OpenAI's GPT family, Google's Gemini, Anthropic's Claude, and xAI's Grok, researchers observed consistent ordering effects and cross-family dynamics. Performance rarely reached the level of the strongest solo performer. Surprisingly, even within the same model family, cooperative efficiency depended more on who started the dialogue than on model size or compute scale.

Collaboration as a distinct capability

The research delivers a deeper warning: collaboration is not an automatic by-product of intelligence. It is a distinct capability that current AI training methods do not adequately capture. The authors argue that improving collaboration should become a central design objective, on par with reasoning, creativity, and accuracy.

They highlight that many AI ecosystems, ranging from corporate assistants to autonomous agents, will soon rely on distributed systems where different models must share knowledge, negotiate uncertainty, and coordinate under partial visibility. Without reliable collaboration, such systems risk inefficiency, miscommunication, or failure.

The study further notes that distillation processes, often used to create smaller, faster models, appear to degrade collaborative competence disproportionately. While compression preserves factual knowledge, it may remove the subtle reasoning traits required for mutual understanding and grounding in dialogue.

Implications for the AI industry

The findings arrive at a pivotal moment, as companies race to build "agentic" AI architectures, systems made up of multiple specialized models interacting autonomously. Current frameworks, such as Anthropic's MCP, Google's A2A, and Besen's ACP, rely on predetermined communication protocols. The authors caution that such rigid designs will not scale to the open-world environments where flexible, real-time collaboration is needed.

The team's benchmark provides a measurable foundation for future research into AI–AI and AI–human cooperation. By quantifying the degree of collaborative degradation, it opens a new field of study that could reshape how next-generation AI systems are trained, evaluated, and deployed.

Moreover, the paper draws parallels to human teamwork, suggesting that collaboration failures in machines mirror social coordination challenges in people. Just as humans rely on shared conventions, grounding, and communication strategies, AI systems must develop similar mechanisms to function cohesively.

The researchers argue that overcoming the collaboration gap requires rethinking training paradigms, moving from individual optimization toward collective intelligence. This includes designing models explicitly for dialogue, role-taking, and shared representation, rather than expecting these abilities to emerge spontaneously.

FIRST PUBLISHED IN:
Devdiscourse

AI’s hidden weakness: Leading models struggle to collaborate effectively

When strong AI systems fail at teamwork

Relay inference: A new way to bridge the gap

Collaboration as a distinct capability

Implications for the AI industry

TRENDING

Lincoln's Compassionate Hand: A Tale of Friendship Across Racial Lines

Youth Conclave 'Samvidhan-The Soul of Bharat' Ignites India's Future

Freedom of the Stage: Amol Palekar's Fight Against Pre-Censorship

Outcry Over Defaced Statue of R Sankar Sparks Political Strife in Kerala

DevShots

Latest News

Trump and Orban Forge Economic Ties Amid Contentious Russian Energy Debates

Shein Set to Resume Operations: French Finance Ministry Lifts Ban

U.S. Stock Market: A Roller-Coaster Week Amid Economic Worries

U.S. Government Shutdown Forces Unprecedented Airline Flight Cuts Amid Chaos

Illegal Bangladeshi Residents Arrested Ahead of PM Modi's Visit

OPINION / BLOG / INTERVIEW

Cyber-resilient electric mobility: AI shields EV charging stations from cyber-attacks

AI can decode medical records with near-human accuracy

New AI model could save historic monuments before they crumble

How AI systems must prove trust, transparency and reliability before clinical use

Connect us on

SECTORS

EDITIONS

OTHER LINKS

OTHER PRODUCTS

CONNECT