Large Language Models for Cochlear Implant Education: A Comparison of ChatGPT, Gemini, Claude, and DeepSeek

BİRİNCİ, MEHMET; Kilictas, Ahmet; Gul, Oguz; YEMİŞ, TUĞBA; ERDİVANLI, BAŞAR; ÇELİKER, METİN; Ozgur, Abdulkadir; ÇELEBİ ERDİVANLI, ÖZLEM; DURSUN, ENGİN

doi:10.1002/ohn.70192

Large Language Models for Cochlear Implant Education: A Comparison of ChatGPT, Gemini, Claude, and DeepSeek

BİRİNCİ M., Kilictas A. U., Gul O., YEMİŞ T., ERDİVANLI B., ÇELİKER M., ...Daha Fazla

OTOLARYNGOLOGY-HEAD AND NECK SURGERY, 2026 (SCI-Expanded, Scopus)

Yayın Türü: Makale / Tam Makale
Basım Tarihi: 2026
Doi Numarası: 10.1002/ohn.70192
Dergi Adı: OTOLARYNGOLOGY-HEAD AND NECK SURGERY
Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus, CINAHL, EMBASE, MEDLINE
Recep Tayyip Erdoğan Üniversitesi Adresli: Evet

Özet

Objective Artificial intelligence-supported large language models (LLMs) have become increasingly widespread in recent years in the health communication and patient education. Models such as ChatGPT, Claude, Gemini, and DeepSeek are used to provide information on complex medical topics, thanks to their natural language processing capabilities. This study compares the responses of models to 5 frequently asked questions about cochlear implants in terms of content and communication quality. Study Design Comparative analysis of 4 LLMs using expert-evaluated responses to cochlear implant queries. Setting Virtual simulation with blinded specialist assessments. Methods Five of the most frequently searched cochlear implant questions on Google were selected. Each question was individually posed to ChatGPT-4, Gemini 2.0, Claude 3.7, and DeepSeek v3. The responses from each model were evaluated by 5 otolaryngology specialists using a 5-point scale based on content accuracy and communication appropriateness. One-way ANOVA and post hoc tests were used for statistical analysis. Results Statistically significant differences were identified among the models in both content and communication quality (P < .05). The DeepSeek model achieved the highest average scores in both areas, while the Claude model generally received the lowest scores. ChatGPT-4 demonstrated a balanced performance, while Gemini stood out in certain communication criteria. Conclusion This study is one of the first comparative analyses evaluating the performance of 4 different large language models in the context of patient education about cochlear implants. Although some models appear more suitable for patient education, the findings indicate that these systems still have limitations when used without expert oversight.