GAZI MEDICAL JOURNAL, cilt.36, sa.4, ss.407-416, 2025 (ESCI, Scopus)
Objective: While large language models (LLMs) have been increasingly evaluated for medical inquiries, their responses to questions about autism spectrum disorder (ASD) remain underexplored. This study aims to evaluate and compare four publicly available LLMs-ChatGPT-3.5, ChatGPT-4.0, Google Gemini, and Microsoft Copilot-regarding autism-related queries. Methods: Nineteen frequently asked autism-related questions categorized into symptoms, diagnosis, treatment, and general information. The responses from each LLM were evaluated by three child and adolescent psychiatrists using the patient education materials assessment tool and the Global Quality Score. Thematic analysis was conducted to identify key topics. A majority consensus approach determined the final ratings, and sentiment analysis was performed to assess emotional polarity and subjectivity. Results: ChatGPT-4.0 demonstrated superior overall response quality compared to Microsoft Copilot and Google Gemini (p=0.006, p=0.009). While the overall understandability of responses was similar across all LLMs, ChatGPT-4.0 scored significantly higher than Microsoft Copilot on the content subscale (p=0.026), and Google Gemini outperformed ChatGPT-4.0 in word choice and style (p=0.041). Thematic analysis revealed that all chatbots emphasized early diagnosis and behavioral issues. Sentiment analysis indicated a high degree of objectivity across all models. Google Gemini displayed the highest polarity score (0.115), while subjectivity scores were moderately high across all chatbots, with ChatGPT-4.0 exhibiting the highest subjectivity score (0.452). Conclusion: This study highlights the potential of LLMs, particularly ChatGPT-4.0, to deliver high-quality and easily understandable information regarding ASD. However, given the limitations of LLMs, including their susceptibility to biases and lack of real-world reasoning, further research is needed.