Collaboration gap exposes limits of large language models in real-world use

Collaboration gap exposes limits of large language models in real-world use
Representative image. Credit: ChatGPT

A new study from researchers at the University of Oxford highlights a growing mismatch between the promise of seamless human–artificial intelligence (AI) collaboration and the reality of unstable, effort-intensive workflows.

The study, titled "The Collaboration Gap in Human–AI Work: Grounding and Repair Conditions for Stable Collaboration," published in the Proceedings of the 24th EUSSET Conference on Computer-Supported Cooperative Work (ECSCW), introduces a conceptual framework explaining why collaboration with large language models (LLMs) often breaks down, despite strong standalone performance.

AI systems act like collaborators, but lack shared understanding

The study identifies a major paradox in human–AI interaction. While LLMs are increasingly described as collaborators or teammates, they often fail to support the basic conditions required for true collaboration. In practice, users frequently spend time rephrasing prompts, correcting outputs, and reconstructing missing assumptions to keep tasks on track.

This issue is framed through the concept of common ground, a foundational idea in communication theory that refers to shared understanding of goals, context, and assumptions. In human collaboration, common ground is built through continuous feedback, clarification, and mutual adjustment. The study finds that these processes are weakly supported in current AI systems, leading to breakdowns in coordination.

The research shows that collaboration fails not only when AI produces incorrect outputs, but when users cannot reliably interpret those outputs or align them with the task at hand. Even high-performing models can struggle in collaborative settings because they do not actively maintain a shared representation of the problem.

This gap becomes especially visible in complex tasks that require iterative reasoning, context tracking, and adaptation. Unlike simple request-and-response interactions, these tasks demand continuous alignment between human and machine, something current systems are not designed to support effectively.

Three interaction modes reveal why collaboration breaks down

To explain the fragility of human–AI collaboration, the study introduces a framework based on three distinct interaction structures: one-shot assistance, weak collaboration, and grounded collaboration. Each represents a different level of shared understanding and distribution of effort in maintaining alignment.

One-shot assistance represents the simplest form of interaction. In this mode, users provide a prompt and receive a response, with little or no iterative engagement. This structure works well for straightforward tasks such as summarization or generating boilerplate text, but it does not support deeper collaboration because shared understanding remains minimal.

Weak collaboration emerges when interaction becomes iterative. Users refine prompts, request revisions, and attempt to guide the system toward desired outcomes. While this may appear collaborative, the study finds that the burden of maintaining alignment falls almost entirely on the human user. Users must diagnose errors, infer missing context, and steer the interaction manually.

This mode represents the core of the collaboration gap. The interaction looks like a partnership, but lacks the underlying mechanisms needed for stable coordination. As a result, collaboration remains inefficient and error-prone, even when the system produces useful outputs.

Grounded collaboration represents the ideal but less common scenario. In this mode, the interaction actively supports shared understanding through mechanisms such as clarification, signalling of assumptions, and mutual repair of errors. The system helps make its reasoning more visible and supports the user in maintaining alignment.

Most current human–AI interactions fall into the weak collaboration category. This explains why users often experience frustration and inefficiency despite the apparent capabilities of modern AI systems.

Repair burden falls heavily on users, limiting real collaboration

The study focuses on repair burden, the effort required to detect and correct misalignment during interaction. The researchers argue that the distribution of this burden is a critical factor in determining whether collaboration is stable or fragile.

In one-shot assistance, repair occurs after the output is produced, with users responsible for identifying and fixing errors. In weak collaboration, repair becomes continuous but remains largely human-driven. Users must repeatedly intervene to correct misunderstandings, making the process time-consuming and cognitively demanding.

In grounded collaboration, repair is more evenly distributed. The system actively supports error detection and correction, reducing the burden on the user. However, the study finds that current AI systems rarely achieve this level of interaction.

This imbalance has significant implications for productivity and user experience. Even when AI systems generate high-quality outputs, the effort required to maintain alignment can offset their benefits. The study suggests that improving collaboration requires not only better models but also better interaction design that reduces repair burden.

Design mechanisms offer pathway to stable human–AI workflows

Based on interviews with 16 designers, developers, and AI practitioners, the study identifies three key mechanisms that can improve grounding and reduce repair burden: scoping, signalling, and repair. Scoping involves narrowing the task and defining clear boundaries for interaction. By breaking tasks into smaller, well-defined stages, users can reduce ambiguity and make it easier to maintain shared understanding. This approach limits the complexity of what the human and AI must coordinate.

Signalling focuses on making the system's understanding visible. This can include asking the AI to restate assumptions, summarize task progress, or highlight uncertainty. By exposing internal reasoning, signalling helps users assess whether the system is aligned with their goals.

Repair mechanisms provide structured ways to correct errors, including revision, rollback, and clarification processes. Instead of relying on ad hoc fixes, these mechanisms integrate repair into the interaction design, making it a core feature rather than an afterthought.

The study argues that these mechanisms are not optional enhancements but essential conditions for stable collaboration. Without them, human–AI interaction remains prone to breakdowns and inefficiencies.

Collaboration gap reflects design limits, not just model capability

The collaboration gap cannot be solved by improving model accuracy alone. Even highly capable systems can fail in collaborative settings if they do not support grounding and repair. This shifts the focus from purely technical performance to interaction design. The study suggests that collaboration should not be inferred from the presence of multi-turn interaction. Just because a system supports dialogue does not mean it supports true collaboration.

Collaboration must be evaluated based on how well the system enables shared understanding and distributes the effort of maintaining alignment. This perspective challenges current assumptions about AI as a seamless collaborator and highlights the need for more human-centered design approaches.

Implications for the future of AI-assisted work

According to the study, it is important for developers to design systems that support grounding and repair. This includes improving transparency, enabling better feedback mechanisms, and reducing the cognitive load on users.

Organizations need to rethink how AI tools are integrated into workflows. Training, interface design, and task structuring all play a role in determining whether AI enhances or hinders productivity.

For policymakers and researchers, the study raises questions about accountability and trust in AI systems. If users must constantly correct and reinterpret outputs, the reliability of AI-assisted decision-making becomes a critical concern.

  • FIRST PUBLISHED IN:
  • Devdiscourse
Give Feedback