In a study, a team of UCLA investigators detail a deep-learning model pre-trained on 2D scans that accurately predicts disease-risk factors from 3D medical-scan modalities.
A team of researchers at UCLA has developed a deep-learning framework that teaches itself quickly to automatically analyze and diagnose MRIs and other 3D medical images – with accuracy matching that of medical specialists in a fraction of the time.
An article describing the work and the system’s capabilities is published in Nature Biomedical Engineering.1,2
There are a few models under development to analyze 3D images, but the investigators noted this new framework has wide adaptability across a range of imaging modalities. The researchers have studied it with 3D retinal scans (optical coherence tomography) for disease risk biomarkers, ultrasound videos to examine heart function, 3D MRI scans to assess the severity of liver disease, and 3D CT for chest nodule malignancy screening. They note that it offers a foundation that ultimately may be of value in a variety of other clinical settings, and additional studies are planned.1
Artificial neural networks train themselves by performing many repeated calculations and screening extensive datasets examined and labeled by clinical experts. Unlike standard 2D images that show length and width, the 3D imaging technologies also add depth, and these “volumetric,” or 3D images take additional skill, time and attention to be interpreted by an expert. For example, a 3D retinal imaging scan could be made up of about 100 2D images, requiring several minutes of close inspection by a highly trained clinical specialist to root out delicate disease biomarkers, such as measuring the volume of an anatomical swelling.2
Oren Avram, PhD, a postdoctoral researcher at UCLA Computational Medicine and first author on the paper, offered some insight into the research.
“While there are many AI (artificial intelligence) methods for analyzing 2D biomedical imaging data, compiling and annotating large volumetric datasets that would be required for standard 3D models to exhaust AI’s full potential is infeasible with standard resources,” Avram said in the UCLA news release. “Several models exist, but their training efforts typically focus on a single imaging modality and a specific organ or disease.”
The UCLA computer model, called SLIViT, for SLice Integration by Vision Transformer, includes a combination of 2 artificial intelligence components and a unique learning approach that the investigators point out will enable it to accurately predict disease risk factors from medical scans across multiple volumetric modalities with moderately sized labeled datasets.2
“SLIViT overcomes the training dataset size bottleneck by leveraging prior ‘medical knowledge’ from the more accessible 2D domain,” said Berkin Durmus, a UCLA PhD student and co-first author of the article.
With Avram, Durmus is affiliated with the UCLA Henry Samueli School of Engineering and other UCLA schools and departments.
“We show that SLIViT, despite being a generic model, consistently achieves significantly better performance compared to domain-specific state-of-the-art models. It has clinical applicability potential, matching the accuracy of manual expertise of clinical specialists while reducing time by a factor of 5,000,” he said in the release. “And unlike other methods, SLIViT is flexible and robust enough to work with clinical datasets that are not always in perfect order.”
Moreover, Avram explained that SLIViT’s automated annotation ultimately could be of benefit to patients and clinicians by increasing diagnostic efficiency and timeliness, and it advances and speeds medical research by reducing data acquisition costs and duration. It also offers a foundation model to accelerate development of future predictive models.
SriniVas R. Sadda, MD, a professor of Ophthalmology at UCLA Health and the Artificial Intelligence & Imaging Research director at the Doheny Eye Institute, lauded the research.
“What thrilled me most was SLIViT’s remarkable performance under real-life conditions, particularly with low-number training datasets,” Sadda said. “SLIViT thrives with just hundreds – not thousands – of training samples for some tasks, giving it a substantial advantage over other standard 3D-based methods in almost every practical case related to 3D biomedical imaging annotation.”
Eran Halperin, PhD, a professor of Computer Science at the Henry Samueli School of Engineering and Computational Medicine at the UCLA David Geffen School of Medicine, explained that even if financial resources were unlimited, research likely will always face challenges presented by limited training datasets – in clinical environments, for instance, or when considering emerging biomedical-imaging modalities.
“When a new disease-related risk factor is identified, it can take months to train specialists to accurately annotate the new factor at scale in biomedical images,” he said. “But with a relatively small dataset, which a single trained clinician can annotate in just a few days, SLIViT can dramatically expedite the annotation process for many other non-annotated volumes, achieving performance levels comparable to clinical specialists.”
Sadda and Halperin are co-senior authors of the paper.
While the investigators will expand their studies to include additional treatment modalities, they also will investigate how SLIViT can be leveraged for predictive disease forecasting to enhance early diagnosis and treatment planning. To promote its clinical applicability, they also will examine ways to ensure that systematic biases in AI models do not contribute to health disparities.1