ABSTRACT
Objective
To investigate integrating an artificial intelligence (AI) system into diagnostic breast ultrasound (US) for improved performance.
Materials and Methods
Seventy suspicious breast mass lesions (53 malignant and 17 benign) from seventy women who underwent diagnostic breast US complemented with shear wave elastography, US-guided core needle biopsy and verified histopathology were enrolled. Two radiologists, one with 15 years of experience and the other with one year of experience, evaluated the images for breast imaging-reporting and data system (BI-RADS) scoring. The less-experienced radiologist re-evaluated the images with the guidance of a commercial AI system and the maximum elasticity from shear wave elastography. The BI-RADS scorings were processed to determine diagnostic performance and malignancy detections.
Results
The experienced reader demonstrated superior performance with an area under the curve (AUC) of 0.888 [95% confidence interval (CI): 0.793–0.983], indicating high diagnostic accuracy. In contrast, the Koios decision support (DS) system achieved an AUC of 0.693 (95% CI: 0.562–0.824). The less-experienced reader, guided by both Koios and elasticity, showed an AUC of 0.679 (95% CI: 0.534–0.823), while Koios alone resulted in an AUC of 0.655 (95% CI: 0.512–0.799). Without any guidance, the less-experienced reader exhibited the lowest performance, with an AUC of 0.512 (95% CI: 0.352–0.672). The experienced reader had a sensitivity of 98.1%, specificity of 58.8%, positive predictive value of 88.1%, negative predictive value of 90.9%, and overall accuracy of 88.6%. The Koios DS showed a sensitivity of 92.5%, specificity of 35.3%, and an accuracy of 78.6%. The less-experienced reader, when guided by both Koios and elasticity, achieved a sensitivity of 92.5%, specificity of 23.5%, and an accuracy of 75.7%. When guided by Koios alone, the less-experienced reader had a sensitivity of 90.6%, specificity of 17.6%, and an accuracy of 72.9%. Lastly, the less-experienced reader without any guidance showed a sensitivity of 84.9%, specificity of 17.6%, and an accuracy of 68.6%.
Conclusion
Diagnostic evaluation of the suspicious masses on breast US images largely depends on experience, with experienced readers showing good performances. AI-based guidance can help improve lower performances, and using the elasticity metric may further improve the performances of less experienced readers. This type of guidance may reduce unnecessary biopsies by increasing the detection rate for malignant lesions and deliver significant benefits for routine clinical practice in underserved areas where experienced readers may not be available.
Key Points
• Artificial intelligence (AI) decision support software, can enhance the characterization of preselected ultrasound lesions by providing breast imaging-reporting and data system scoring and associated cancer risk predictions.
• AI systems can support radiologists, particularly those with less experience.
• The integration of AI-guided systems with shear wave elastography measurements demonstrates potential for reducing unnecessary biopsies.
Introduction
Breast cancer is the second most common cancer among women worldwide (1). It is a complex disease with varying types and stages, necessitating early detection and accurate diagnosis for effective treatment. Mammography has been at the forefront as the first modality of breast cancer detection for years. However, more refined, accessible, and less invasive modalities have led to significant advances in imaging technologies. Among these, breast ultrasound (US) has emerged as a vital diagnostic tool, offering substantial benefits, especially when used alongside other imaging modalities (2). Its non-invasive nature, absence of radiation exposure, and ability to visualize dense breast tissue make it an excellent option for a wide range of patients, including those for whom radiation is a concern (3).
The advent of sophisticated artificial intelligence (AI) systems has further enhanced the capability of breast US (4). Research indicates that AI, known for its high accuracy and sensitivity, holds great promise in assisting radiologists and breast specialists (5). Particularly for those with less experience, AI can serve as a valuable tool, helping to reduce the rate of misdiagnosis and ensure patients receive timely and appropriate treatment (6). Despite its low specificity, the potential of AI in improving US diagnostics cannot be understated, underlining the need for ongoing research in this exciting intersection of technology and medicine. Commercially available AI systems for breast US are designed to enhance diagnostic accuracy, streamline workflow, and improve the overall efficiency of breast cancer detection (7). The Koios decision support (DS) system has been designed to assist radiologists in classifying and diagnosing breast lesions using US imaging. Koios DS uses deep learning algorithms to analyze US images and provides a breast imaging-reporting and data system (BI-RADS) classification to help determine the necessity of a biopsy (8). S-Detect for Breast US is another software integrated into US devices that uses an advanced AI algorithm to analyze and classify the characteristics of breast lesions in US images, aiming to increase diagnostic accuracy and efficiency. S-Detect™ provides standardized reporting, similar to Koios DS, and can assist in reducing variability among different examiners (9). QVCAD is a computer-aided detection system designed for use in both breast US and mammography. It aids radiologists by highlighting areas that may warrant a closer look, thus potentially improving detection rates and reducing the time taken to review images (10, 11). Each system brings a unique approach to integrating AI into breast US imaging. The common goal is to support radiologists by providing a second opinion, reducing the chance of missed diagnoses, and improving the specificity and sensitivity of breast cancer detection through US, serving as an adjunct to the radiologist’s expertise (6, 12). This can potentially decrease unnecessary biopsies and allow for a more accurate and timely diagnosis of breast cancer (8).
Continual advances in AI technology and ongoing research ensure AI systems become increasingly sophisticated, further revolutionizing breast cancer diagnostics using US imaging. The aim of the current study was to assess the role of Koios DS in augmenting the capabilities of experienced and less experienced radiologists and improving efficiency and accuracy in diagnosing breast lesions.
Materials and Methods
Study Population
The institutional review board approved the study (İstanbul Bilgi University, Committee on Ethics in Research on Humans with the approval number of 2024-50162-062, date: 04.03.2024), and written informed consent was obtained from participants. Initially, a cohort of 80 patients was considered who had been admitted to our institute with a suspicion of breast cancer and had undergone diagnostic breast US imaging and US-guided core needle biopsy between September 2022 and August 2023. The exclusion criteria applied were being pregnant or breastfeeding, undergoing neoadjuvant chemotherapy, having a prior history of US-core needle biopsy of the target lesion, and having poor-quality US images. Finally, seventy patients with a total of seventy suspicious breast lesions (53 malignant and 17 benign) were enrolled in this retrospective study.
Breast US Imaging and Ultrasound-Guided Biopsy
The breast US imaging was conducted using the GE LOGIQ E9 system (GE Healthcare, USA). The imaging covered conventional B-mode and color Doppler imaging using a high-frequency broad-bandwidth linear matrix array transducer (ML6-15 transducer, GE Healthcare) and shear wave elastography (SWE) using a linear array probe (9-L probe, GE Healthcare). During SWE imaging, patients were instructed to stop breathing for five seconds to reduce motion artefacts. Lesions were imaged in their longest diameter with minimal pressure applied to the breast. The imaging display featured a side-by-side panel for B-mode and SWE images, allowing real-time breast lesion evaluation. A circular region of interest was positioned on the stiffest part of the lesion, and the maximum elasticity was measured in kilopascals (13, 14). The two orthogonal transverse and sagittal grayscale US images and maximum elasticity measurement for lesions were stored in the picture archiving and communication system.
US-guided biopsy procedures were performed with an automated biopsy gun equipped with a 14-gauge needle (Bard Peripheral Vascular, Inc., USA). This technique combined the real-time imaging capabilities of US with the precision of an automated biopsy gun to collect tissue samples from inside the body, which improved the accuracy of diagnoses, reduced the risk of complications, and often requires only local anesthesia, making it a preferred option for many patients and healthcare providers alike (15). The obtained biopsy specimens were subsequently dispatched for standard histopathological evaluation, considered the gold standard diagnosis for the analyses in the current study.
AI Augmented Image Evaluation
The orthogonal transverse and sagittal grayscale breast US images of lesions were inputted into an AI-incorporated computerized image analysis software implemented in the PACS system, which is not available on the US machine, the Koios DS study tool (version 2.3.0; Koios Medical Inc., IL, USA). The user marked the centers of the lesions on the images, and the tool then automatically segmented the lesions. The user had the option to correct the segmentation manually. Finally, the tool extracted morphological features for the lesions and used them to provide a risk indicator for the likelihood of malignancy. The risk indicator fell into four categories: “Benign,” which indicated BI-RADS 2 assessment; “Probably Benign,” which referred to BI-RADS 3 assessment; “Suspicious,” which designated BI-RADS 4A/B assessment; and “Probably Malignant,” which stated BI-RADS 4C+ assessment (for illustrations, see Figures 1, 2).
Image Evaluation by the Readers
The orthogonal transverse and sagittal grayscale breast US images of a mass lesion were reviewed by two readers blinded to the patient’s clinical data. Among the two readers, one had 15 years of experience (F.C.), whereas the other had one year of experience (M.O.) in breast US imaging. The experienced reader conducted real-time US, while the less-experienced reader assessed the images stored in the PACS system. The readers evaluated for differences in echotexture of mass lesions, and evaluated various aspects of mass lesions, including shape and margin, size, echogenicity, posterior features, elasticity, and vascularity based on the US characteristics; the lesions were classified using BI-RADS (16).
After evaluating the images, the less experienced reader was allowed to revise their previous BI-RADS assessments one month later. This deliberate temporal gap was to diminish any potential bias and enhance the repeated assessment’s objectivity. In a two-stage review, the reader was informed about the categorizations by the Koios AI system first and second, informed about the categorizations due to the maximum elasticity in adjuncts. In each stage, the alterations in their assessments were marked.
The BI-RADS assessments by the two readers and the AI system were recorded for use in enumerating overall diagnostic performances. They were processed to quantify detection performances by performing dichotomization into benign or malignant detection as follows: BI-RADS 2 and 3 were designated as benign, while BI-RADS 4A/B and 4C+ were considered malignant.
Statistical Analysis
Youden’s analysis was performed to determine the optimum lower and upper thresholds for the maximum elasticity measure, ensuring sensitivity and specificity at a level of 95%. The lesions with elasticity lower than the lower threshold were categorized as benign, while those with elasticity higher than the upper threshold were categorized as malignant. The lesions with elasticity between the lower and upper thresholds were acknowledged as non-specific.
Overall diagnostic performance was enumerated by plotting the receiver operator characteristics (ROC) curve and calculating the area under the curve (AUC). The performance according to an AUC was attributed excellent, good, fair, poor, and fail if the AUC was 0.90–1.00, 0.80–0.89, 0.70–0.79, 0.60–0.69, and 0.50–0.59, respectively. The sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), and accuracy metrics were used to evaluate the detection performances. They were considered very high, high, moderate, low, and very low if their values were 95–100%, 85–94.9%, 75–84.9%, 65–74.9%, and 0–64.9%, respectively. All analyses were performed using SPSS, version 25.0 statistical software (IBM Inc., Armonk, NY, USA).
Results
Seventy women aged between 32 and 87 years (mean, 50.4 years) were included in the study. There were seventy breast masses, of which fifty-three were malignant (75.7%) and seventeen were benign (24.3%). Among the malignant masses, the predominant pathology was invasive ductal carcinoma (61.4%), while it was fibrosis among the benign masses (14.3%). The mass diameter ranged from 5 to 65 mm [mean, 19.9 mm; standard deviation (SD), 12.6 mm]. The mass elasticity varied from 10 to 184 kPa (mean, 87.8 kPa; SD, 45.6 kPa) (Table 1).
Figure 3 shows the plot for Youden’s statistics that reveals the sensitivity and specificity for the maximum elasticity of the SWE. On this plot, consideration of 95% sensitivity and specificity gives the lower and the upper thresholds of 20 kPa and 138 kPa for the elasticity, which quantifies the stiffness of tissues as an indicator in differentiating between benign and malignant breast lesions (malignant lesions tend to be stiffer compared to benign ones). Consequently, the masses with Emax ≤20 kPa were classified as benign, 20 kPa < Emax <138 kPa were classified as non-specific, and Emax ≥138 kPa were classified as malignant.
Table 2 tabulates the BI-RADS categorizations of the breast masses by the experienced reader, the less-experienced reader alone, the Koios DS AI system alone and the less-experienced reader with either AI system support or AI system and elasticity data. The AI system decision guidance led to four upgrades (BIRADS 3 to BIRADS 4A) and one downgrade (BIRADS 4A to BIRADS 3). Moreover, incorporating the elasticity classification upgraded two lesions while downgrading two lesions though one of the lesions was malignant. Figure 4 shows the plots for the ROC curves illustrating the overall diagnostic performances of the readers and the AI system considering the BI-RADS categories and the histopathological assessments. The experienced reader showed good overall diagnostic performance (AUC=0.888), and the Koios DS AI system attained fair overall diagnostic performance (AUC=0.693). The less-experienced reader showed poor to fair performances impacted by the guidance. The performance was poor when no guidance was considered (AUC=0.512). However, it improves when the reader received guidance from the AI system (AUC=0.655). Further improvement in the performance was accomplished when the reader was guided by both the AI system and the maximum elasticity from elastography (AUC=0.679) (Table 3).
The BI-RADS assessments by the two readers and the Koios DS AI system were dichotomized into benign or malignant detection (BI-RADS 2 and 3 were designated as benign, while BI-RADS 4A, 4B, and 4C+ were designated malignant). Table 4 tabulates the diagnostic performances of the experienced reader, the AI system, and the less-experienced reader due to the dichotomization. Corresponding bar plots are shown in Figure 5. The experienced reader correctly diagnosed 52 out of the 53 malignant lesions and 10 out of the 17 benign lesions. The one misdiagnosed malignant lesion was an invasive ductal carcinoma. The misdiagnosed benign lesions were three fibrosis, two fibroadenomas, one atypical ductal hyperplasia (ADH), and one sclerosing adenosis. Subsequently, the experienced reader achieved 98.1% sensitivity, 58.8% specificity, 88.1% PPV, 90.9% NPV, and 88.6% accuracy in the diagnosis. The AI system correctly identified 49 of the 53 malignant lesions and 6 of the 17 benign lesions. The misclassified lesions were two invasive ductal carcinoma, two ductal carcinoma in situ (DCIS), six fibrosis, three fibroadenomas, and two sclerosing adenosis. The AI system demonstrated a sensitivity of 92.5%, specificity of 35.3%, PPV of 81.7%, NPV of 60.0%, and overall accuracy of 78.6%. The less-experienced reader correctly diagnosed 45 out of 53 malignant lesions and 3 out of 17 benign lesions without any guidance. The misdiagnosed lesions included seven invasive ductal carcinomas, one DCIS, seven fibrous lesions, four fibroadenomas, two cases of sclerosing adenosis, and one ADH. Consequently, the less experienced reader achieved 84.9% sensitivity, 17.6% specificity, 76.3% PPV, 27.3% NPV, and 68.6% accuracy. When informed of the AI system classification, the less-experienced reader diagnosed 48 of the 53 malignant lesions and 3 of the 17 benign lesions. This practice slightly improved the performance: 90.6% sensitivity, 17.6% specificity, 77.4% PPV, 37.5% NPV, and 72.9% accuracy. When the less-experienced reader made a diagnosis knowing both the decision of the AI system and the detection regarding the maximum shear wave velocity, 49 out of the 53 malignant lesions and 4 out of the 17 benign lesions are correctly diagnosed. This practice markedly improved the performance: 92.5% sensitivity, 23.5% specificity, 79.0% PPV, 50.0% NPV, and 75.7% accuracy.
Discussion and Conclusion
The current work reveals that the sensitivity and specificity of the AI system were lower than those accomplished by an experienced reader but higher than the less-experienced reader. In accordance with our study, Chabi et al. (12) showed that the contribution of computed aided diagnosis (CAD) varied according to the level of experience of the radiologists, increasing sensitivity from approximately 88% to 99%. Similar to their results, the Koios DS AI system increased the sensitivity of the less-experienced reader to 90.6% from 84.9%, although the specificity remained the same in our study (12). Lee et al. (6) demonstrated that the diagnostic performance of the inexperienced group did not differ from or was lower than that of CAD, and adjunct use of CAD enhanced the performance from 0.65 to 0.71, similar to our findings, which showed an improvement from 0.512 to 0.650. Compared with a single S-Detect or conventional ultrasound, S-Detect combined with elastography showed higher accuracy and specificity (17).
Park et al. (18) demonstrated a significant enhancement in both PPV, increasing from 53.3% to 76.2%, and AUC, rising from 0.623 to 0.759, through the integration of CAD, which is consistent with our findings.
To the best of our knowledge, few published studies have compared AI with radiologists at varying levels of expertise and incorporate SWE measurements into their findings. Our results demonstrated that the implementation of SWE along with KOIOS has been shown to enhance the AUC and specificity. Sun et al. (19) found that their combined AI model achieved an AUC of 0.89 and a specificity of 92%, exceeding the performance of individual models, including clinical, ultrasonic, elastography, and AI-only approaches. Similarly, our results support these findings. While US is a crucial imaging modality for detecting primary breast malignancies, recent studies have increasingly focused on AI-based advances. AI is advantageous for identifying internal textures; there exists a lack of notable studies about the diagnosis of breast lesions, including DCIS, and the AI-assisted assessment of BI-RADS exceeding the capabilities of radiologists and standardizing assessments, BI-RADS categories (20). Determining DCIS and other breast lesions through US is important for early preventive treatment measures. Yin et al. (21) showed that US radiomics-based AI can effectively differentiate between DCIS and benign fibroadenomas. However, Berg et al. (22) highlighted that AI software has not been trained on a sufficient number of US images of masses in the context of DCIS and that there is a need for improvement in this area. Our study also found that the AI missed two DCIS cases, both of which were correctly diagnosed by an experienced reader, while one was overlooked by a less-experienced reader, although not all the other lesions were benign. These results suggest that the AI’s diagnostic performance does not match that of an experienced reader in recognizing DCIS as malignant.
There are some limitations of the current study. Firstly, it was limited by its retrospective, single-center design and a relatively small number of patients. In addition, the evaluation of breast US images was conducted by two radiologists, which may impact the generalizability of the findings to the broader population. Furthermore, the use of Koios DS in the study was limited as it is not integrated within the US device and was only implemented in the PACS system for the second evaluation by an inexperienced radiologist. This setup prevents simultaneous evaluation with the Koios results by the experienced radiologist during lesion assessment. Moreover, the study focused on lesions directed to biopsy, which may introduce a bias towards higher BI-RADS categories. Finally, it is important to keep in mind the low specificity of the AI system to minimize unnecessary biopsies of benign lesions.
In conclusion, evaluating suspicious masses on breast US images requires experience, and the experience level determines the diagnostic performance. Experienced readers may show performance categorized as good, but the performance of the less experienced may only be categorized as fair and thus should be improved. AI-based guidance may improve the lower performances. However, adopting the elasticity metric into the guidance may lead to further improvements in the performances of less experienced readers. This type of guidance may reduce unnecessary biopsies by increasing the detection rate for malignant lesions and deliver significant benefits for routine clinical practice in underserved areas where experienced readers may not be available. The findings advocate for further exploration of AI guidance to improve diagnostic accuracy and patient outcomes in diagnostic breast US.
Ethics
Ethics Committee Approval: The institutional review board approved the study (İstanbul Bilgi University, Committee on Ethics in Research on Humans with the approval number of 2024-50162-062, date: 04.03.2024)
Informed Consent: Written informed consent was obtained from participants.
Authorship Contributions
Surgical and Medical Practices: F.Ç.; Concept: F.Ç., T.D., G.E.; Design: F.Ç., T.D., G.E.; Data Collection or Processing: F.Ç., O.T., M.O., G.E.; Analysis or Interpretation: F.Ç., O.T., M.O., G.E.; Literature Search: F.Ç., T.O., G.E.; Writing: F.Ç., O.T., T.O., G.E.
Conflict of Interest: No conflict of interest was declared by the authors.
Financial Disclosure: The authors declared that this study received no financial support.