Chatbot accuracy in ophthalmology: A work in progress

March 6, 2024

News

Article

ChatGPT-4 (OpenAI) had an overall “fair” performance when answering multiple-choice ophthalmic questions unrelated to multifocal imaging.

(Image credit: AdobeStock/iuriimotov)

A Canadian study, led by first author, Andrew Mihalache, MD, from the Temerty School of Medicine, University of Toronto, Toronto, Ontario, Canada, reported that ChatGPT-4 (OpenAI) had an overall “fair” performance when answering multiple-choice ophthalmic questions unrelated to multifocal imaging.¹

Correct interpretation of clinical images is at the heart of treatment in Ophthalmology to ensure appropriate treatment. With the rapid development of artificial intelligence (AI) globally, the accuracy of technology such as chatbots is imperative.

The authors commented on the importance of this technology, “Ophthalmology is reliant on effective interpretation of multimodal imaging to ensure diagnostic accuracy. Multimodal imaging enhances patient outcomes through earlier and more precise diagnoses, and more effective follow-up visits and treatments.^2,3 The new release of the chatbot holds great potential in enhancing the efficiency of ophthalmic image interpretation, which may reduce the workload on clinicians, mitigate variability in interpretations and errors, and ultimately, lead to improved patient outcomes.”

They conducted a cross-sectional study to evaluate how ChatGPT-4 performed when processing imaging data. A publicly available dataset of ophthalmic cases, OCTCases, a medical education platform from the Department of Ophthalmology and Vision Sciences at the University of Toronto, was used. A total of 137 cases were available and 99% had multiple choice questions. the authors explained. The study’s primary outcome was the accuracy of the chatbot in answering these questions related to image recognition.

Chatbot performance

Among the 136 cases that contained multiple-choice questions, the chatbot was tasked with fielding 429 multiple-choice questions; 448 images also were included in the analysis.

“The chatbot answered 299 multiple-choice questions correctly across all cases (70%). The chatbot’s performance was better on retina questions than neuro-ophthalmology questions (77% vs 58%; difference = 18%; 95% confidence interval [CI], 7.5%-29.4%; χ21 = 11.4; P < 0.001),” Dr. Mihalache and colleagues reported.

They also found that the chatbot did better answering nonimage–based questions compared with image-based questions (82% vs 65%; difference = 17%; 95% CI, 7.8%-25.1%; χ21 = 12.2; P < 0.001).

Finally, the chatbot showed an intermediate performance on questions based on the topics of ocular oncology (72% correct), pediatric ophthalmology (68% correct), uveitis (67% correct), and glaucoma (61% correct).

The authors concluded, “In this study, the recent version of the chatbot accurately responded to most multiple-choice questions pertaining to ophthalmic cases requiring multimodal input from OCTCases, albeit performing better on questions that did not rely on ophthalmic imaging interpretation. As multimodal large language models become increasingly widespread, it remains imperative to continuously stress their appropriate use in medicine and highlight concerns surrounding confidentiality and bioethics. Future studies should continue investigating the chatbot’s ability to interpret different ophthalmic imaging modalities to gauge whether it can eventually become as accurate as specific machine learning systems in ophthalmology. Future work should also evaluate the chatbot’s ability to interpret ophthalmic images that are not publicly accessible.”

References:

Mihalache A, Huang RS, Popovic MM, et al. Accuracy of an artificial intelligence Chatbot’s interpretation of clinical ophthalmic images. JAMA Ophthalmol. 2024; published online February 29; doi:10.1001/jamaophthalmol.2024.0017
Schuster AK, Wolfram C, Hudde T, et al. Impact of routinely performed optical coherence tomography examinations on quality of life in patients with retinal diseases-results from the ALBATROS data collection. J Clin Med. 2023;12(12):3881. doi:10.3390/jcm12123881
Huang D, Swanson EA, Lin CP, et al. Optical coherence tomography. Science. 1991;254(5035):1178-1181. doi:10.1126/science.1957169

Keep your retina practice on the forefront—subscribe for expert analysis and emerging trends in retinal disease management.

Subscribe Now!

Chatbot accuracy in ophthalmology: A work in progress

Chatbot performance

References:

Mihalache A, Huang RS, Popovic MM, et al. Accuracy of an artificial intelligence Chatbot’s interpretation of clinical ophthalmic images. JAMA Ophthalmol. 2024; published online February 29; doi:10.1001/jamaophthalmol.2024.0017

Schuster AK, Wolfram C, Hudde T, et al. Impact of routinely performed optical coherence tomography examinations on quality of life in patients with retinal diseases-results from the ALBATROS data collection. J Clin Med. 2023;12(12):3881. doi:10.3390/jcm12123881

Huang D, Swanson EA, Lin CP, et al. Optical coherence tomography. Science. 1991;254(5035):1178-1181. doi:10.1126/science.1957169

Newsletter