Evaluating the Efficacy of Large Language Models in Guiding Treatment Decisions for Pediatric Refractive Error
Evaluating the Efficacy of Large Language Models in Guiding Treatment Decisions for Pediatric Refractive Error

Evaluating the Efficacy of Large Language Models in Guiding Treatment Decisions for Pediatric Refractive Error

Ophthalmol Ther. 2025 Feb 22. doi: 10.1007/s40123-025-01105-2. Online ahead of print.

ABSTRACT

INTRODUCTION: Effective management of pediatric myopia, which includes treatments like corrective lenses and low-dose atropine, requires accurate clinical decisions. However, the complexity of pediatric refractive data, such as variations in visual acuity, axial length, and patient-specific factors, pose challenges to determining optimal treatment. This study aims to evaluate the performance of three large language models in analyzing these refractive data.

METHODS: A dataset of 100 pediatric refractive records, including parameters like visual acuity and axial length, was analyzed using ChatGPT-3.5, ChatGPT-4o, and Wenxin Yiyan, respectively. Each model was tasked with determining whether intervention was needed and subsequently recommending a treatment (eyeglasses, orthokeratology lens, or low-dose atropine). The recommendations were compared to professional optometrists’ consensus, rated on a 1-5 Global Quality Score (GQS) scale, and evaluated for clinical safety utilizing a three-tier accuracy assessment.

RESULTS: ChatGPT-4o outperformed both ChatGPT-3.5 and Wenxin Yiyan in determining intervention needs, with an accuracy of 90%, significantly higher than Wenxin Yiyan (p < 0.05). It also achieved the highest GQS of 4.4 ± 0.55, surpassing the other models (p < 0.001), with 85% of responses rated as “good” ahead of ChatGPT-3.5 (82%) and Wenxin Yiyan (74%). ChatGPT-4o made only eight errors in recommending interventions, fewer than ChatGPT-3.5 (12) and Wenxin Yiyan (15). Additionally, it performed better with incomplete or abnormal data, maintaining higher quality scores.

CONCLUSION: ChatGPT-4o showed better accuracy and clinical safety, making it a promising tool for decision support in pediatric ophthalmology, although expert oversight is still necessary.

PMID:39985747 | DOI:10.1007/s40123-025-01105-2