Hybrid AI tutoring could bridge gap between efficiency and understanding

The researchers argue that future AI tutors must move beyond static prompting models to integrate adaptive, hybrid, and retrieval-augmented generation (RAG) systems. By combining the structural clarity of templates with the reflective scaffolding of chain-of-thought or flipped strategies, such systems could balance immediacy and depth.


CO-EDP, VisionRICO-EDP, VisionRI | Updated: 28-10-2025 17:11 IST | Created: 28-10-2025 17:11 IST
Hybrid AI tutoring could bridge gap between efficiency and understanding
Representative Image. Credit: ChatGPT

While generative AI tutors are transforming science and engineering education, students still favor straightforward answers over reflective, concept-driven learning, according to a new study by researchers from the University of Ljubljana.

Published in Sustainability, the study "Custom Generative Artificial Intelligence Tutors in Action: An Experimental Evaluation of Prompt Strategies in STEM Education," explores how different AI prompting strategies affect student learning behavior and pedagogical quality in laboratory-based electrical engineering courses.

AI tutors in the Lab: From prototype to practice

The research team developed a configurable generative AI tutoring prototype designed to simulate teacher-like interaction and adapt to laboratory instruction needs. Using the Claude 3.5 Sonnet model, the system processed 208 student–AI exchanges and compared seven prompting strategies: neutral, persona, template, chain-of-thought, few-shot, game-based, and flipped.

This approach addressed three research questions: the nature of student queries (RQ1), the instructional quality of AI responses (RQ2), and student preferences among prompt styles (RQ3). The study found that most student prompts were procedural (59.1%) or factual (26.4%), revealing that learners primarily used the tutor for immediate help rather than conceptual reasoning or reflection.

The tutoring system was locally hosted for privacy and transparency, giving teachers full control over workflows and content. This design aligned with the broader educational goal of sustainable, human-centered AI integration, consistent with UN Sustainable Development Goal 4 (Quality Education).

Prompt strategies: The battle between clarity and depth

The study compares prompting strategies, each representing a different educational philosophy. While all strategies produced factually correct outputs, their pedagogical depth varied considerably.

  • Template and persona prompts, those that imposed clear structure and teacher-like tone, were ranked highest by students for clarity and usefulness.

  • Chain-of-thought and flipped prompts, which encouraged reasoning and reflection, were rated highest by researchers for pedagogical quality but least preferred by students.

  • Game-based and few-shot prompts were consistently ranked lowest, highlighting that playful or example-heavy approaches were less effective in time-pressured lab contexts.

The findings show a statistically significant difference (p < 0.001) among strategies, with the template configuration earning the strongest preference (average rank 2.76) and the game-based one the weakest (average rank 5.93). Students valued concise, task-oriented answers that supported quick progress, while educators valued strategies that fostered deeper reasoning.

This mismatch forms what the authors term the "preference–pedagogy gap", a divide between what students want and what helps them learn best. It echoes broader concerns in AI-assisted education, where fluency and speed can overshadow reflection and understanding.

Bridging the preference–pedagogy gap

The researchers argue that future AI tutors must move beyond static prompting models to integrate adaptive, hybrid, and retrieval-augmented generation (RAG) systems. By combining the structural clarity of templates with the reflective scaffolding of chain-of-thought or flipped strategies, such systems could balance immediacy and depth.

The study recommends hybrid prompting, where multiple strategies are layered to exploit their complementary strengths. For example, persona prompts can boost engagement, templates ensure structure, and short follow-up questions trigger reflection. This multi-step orchestration can make AI tutors both efficient and educationally meaningful.

Ethical and technical considerations were also addressed. The local hosting model ensured data privacy, while the authors acknowledged potential bias and model drift as ongoing challenges. They note that while STEM domains like electrical engineering are less prone to cultural bias, subtle linguistic disparities may still arise, requiring careful monitoring.

Operational cost remains another limiting factor. With large language model usage priced at USD 3 per million input tokens and USD 15 per million output tokens, large-scale adoption will require cost-efficient configurations or selective use of conversational memory to manage expenses.

Implications for the future of sustainable education

In the long term, the authors envision adaptive tutors capable of shifting from simple procedural guidance to reflective dialogue, gradually increasing cognitive engagement as students' AI literacy grows. Such designs could help close the preference–pedagogy gap, enabling AI systems to function not only as task assistants but as genuine facilitators of sustainable learning.

  • FIRST PUBLISHED IN:
  • Devdiscourse
Give Feedback