ChatGPT vs DeepSeek: AI battle heats up over accuracy, privacy and performance

ChatGPT vs DeepSeek: AI battle heats up over accuracy, privacy and performance
Representative Image. Credit: ChatGPT

New research suggests that not all large language models (LLMs) are equal when it comes to accuracy, reasoning, and, most importantly, data privacy. The study has raised important questions about how leading models handle sensitive medical information and whether efficiency gains come at the cost of security and reliability.

The study evaluates two widely used AI systems, ChatGPT and DeepSeek, through the lens of privacy in medical applications. The findings point to a nuanced trade-off between performance, explainability, computational efficiency, and privacy safeguards in real-world healthcare scenarios.

The research, titled "Evaluation of ChatGPT vs. DeepSeek from a Privacy Perspective" and published in the journal Electronics, is based on an empirical assessment of both models using the MedQA dataset, a benchmark derived from medical licensing examinations.

Performance and reasoning divide highlights strengths of competing AI models

Both ChatGPT and DeepSeek demonstrate high levels of accuracy in medical question answering, reinforcing their growing role in education and clinical support. However, notable differences emerge in how each model processes and explains information.

ChatGPT achieved an accuracy rate of 94 percent on the MedQA subset, outperforming DeepSeek, which recorded 91 percent. This marginal advantage is attributed to ChatGPT's broader training base and optimization for general-purpose question answering. Its ability to deliver faster responses also stood out, with performance metrics showing it to be approximately one-third quicker than DeepSeek in generating answers.

DeepSeek distinguished itself in areas that are critical for medical practice, particularly structured reasoning and explainability. The model's reliance on chain-of-thought reasoning allows it to break down complex clinical problems into step-by-step logic, producing responses that domain experts found more coherent and educationally valuable.

This difference is especially relevant in medical contexts, where understanding the rationale behind a diagnosis or treatment recommendation is as important as the answer itself. The study notes that DeepSeek's explanations were consistently rated higher for clarity, logical coherence, and clinical relevance, even when its final answers were slightly less accurate.

The divergence reflects deeper architectural differences. ChatGPT, built on large-scale transformer models with multimodal capabilities, excels in versatility and speed. DeepSeek, by contrast, leverages a mixture-of-experts framework and distillation techniques, enabling it to operate efficiently on fewer computational resources while maintaining strong reasoning capabilities.

However, these efficiencies come with trade-offs. DeepSeek's reliance on distilled knowledge occasionally resulted in hallucinations—instances where the model generated incorrect or fabricated information. Although relatively rare, occurring in about 2 percent of responses, such errors could have significant implications in clinical settings if left unchecked.

Privacy safeguards show resilience but expose subtle vulnerabilities

The study focuses on how each model handles sensitive medical data, an area of growing concern as AI systems are increasingly used in healthcare environments. The evaluation framework examined whether model outputs contained direct or indirect identifiers, clinical details, or other forms of sensitive information that could compromise patient privacy.

The findings suggest that both ChatGPT and DeepSeek demonstrate strong baseline protections against direct data leakage. In test scenarios involving anonymized patient prompts, neither model reproduced identifiable information such as names or contact details. This indicates that current safeguards against explicit privacy breaches are largely effective.

However, the study also uncovered more subtle vulnerabilities. In a small number of cases, models produced responses that included indirect hints related to patient conditions or medical history. While not constituting explicit data leaks, these outputs highlight the complexity of ensuring complete privacy in AI-generated content, particularly when dealing with nuanced clinical scenarios.

The research emphasizes that privacy risks in AI are not limited to direct exposure of identifiable data. Models trained on large datasets may inadvertently reproduce patterns or fragments of sensitive information, especially if training data includes real-world clinical records. ChatGPT, with its extensive training corpus, carries a theoretical risk of memorizing and reproducing such patterns, while DeepSeek's domain-specific retrieval mechanisms could expose sensitive data if underlying datasets are not properly sanitized.

The study further notes that privacy considerations extend beyond model outputs to system design and deployment. Subscription-based models, API access, and data storage practices all influence how user data is handled, raising broader questions about governance and compliance with regulations such as GDPR and healthcare-specific standards.

Cost, accessibility, and future adoption shape AI's role in healthcare education

The study highlights economic and operational factors that will shape the adoption of AI in healthcare. One of DeepSeek's most significant advantages lies in its efficiency. The model is designed to operate with substantially fewer computational resources, reportedly achieving comparable performance at a fraction of the cost associated with larger models like ChatGPT.

This cost efficiency has important implications for accessibility, particularly in resource-constrained settings. While many leading AI models operate on subscription-based frameworks that limit access to advanced features, DeepSeek's lower resource requirements and open-source orientation offer a more inclusive alternative.

However, affordability alone does not determine adoption. The study underscores the importance of reliability, scalability, and integration with existing systems. ChatGPT's multimodal capabilities, including support for text, audio, and visual inputs, make it more adaptable to diverse educational and clinical applications. DeepSeek, while strong in text-based reasoning, currently lacks these features, limiting its versatility.

The broader landscape of LLMs reflects this diversity of approaches. General-purpose systems like ChatGPT, Gemini, and Claude prioritize flexibility and user experience, while domain-specific models such as DeepSeek focus on specialized tasks and efficiency. Each approach carries distinct advantages and limitations, requiring stakeholders to align model selection with specific use cases.

The study also points to the growing role of AI in medical education, where these tools are being used to simulate clinical scenarios, generate study materials, and provide personalized feedback. By tailoring explanations to individual learning needs, AI has the potential to enhance comprehension and bridge knowledge gaps. However, the effectiveness of such applications depends on the accuracy, transparency, and ethical design of the underlying models.

The researchers call for more comprehensive evaluations that go beyond accuracy metrics to include privacy, explainability, and real-world applicability. They highlight the need for standardized benchmarks and regulatory frameworks to guide the safe deployment of AI in healthcare.

  • FIRST PUBLISHED IN:
  • Devdiscourse
Give Feedback