ABSTRACT
Objective
To evaluate the feasibility of using deep learning models applied to digital breast tomosynthesis (DBT) images for non-invasive prediction of breast cancer biomarkers, including estrogen receptor (ER), progesterone receptor (PR), human epithelial growth factor receptor 2 (HER2), Ki-67 proliferation index, and triple-negative breast cancer (TNBC).
Materials and Methods
In this retrospective study, patients with histopathologically-confirmed, invasive breast cancer were included. Furthermore, all included patients had complete, immunohistochemically-assessed biomarker data available. For each case, a representative DBT slice showing the tumor was selected and preprocessed using histogram equalization. Two pretrained convolutional neural networks (VGG19 and ResNet50) were fine-tuned for binary classification of each biomarker. Model performance was evaluated using accuracy, area under the curve (AUC), F1 score, and Matthews correlation coefficient.
Results
The study sample included 43 anonymized female patients. Deep learning models achieved strong predictive performance for ER (AUC = 0.81) and TNBC (AUC = 0.93). HER2 (AUC = 0.74) and Ki-67 index (AUC = 0.70) were predicted with moderate accuracy. PR results varied, with VGG19 reaching AUC = 0.76 while ResNet50 performed poorly (AUC = 0.24).
Conclusion
Deep learning models applied to DBT images enabled non-invasive prediction of some key breast cancer biomarkers, especially ER status and TNBC type. This approach may function as a virtual biopsy to complement histopathology, guide biopsy targeting, and support treatment planning. Although preliminary, the findings highlight the potential of artificial intelligence-enhanced DBT assessment and warrant validation in larger, multi-center prospective studies.
KEY POINTS
• Breast cancer
• Digital breast tomosynthesis
• Deep learning
• Receptor status
• Artificial intelligence
Introduction
Breast cancer remains a significant global health burden and continues to be the most frequently diagnosed cancer in women. As reported by GLOBOCAN 2022, there were an estimated 2.31 million new breast cancer cases globally, accounting for 23.8% of all female cancers. Furthermore, breast cancer led to approximately 662,000 deaths worldwide, making up 15.4% of all female cancer-related deaths and 6.9% of total cancer mortality across both sexes (1).
Early diagnosis is essential for improving clinical outcomes. Digital breast tomosynthesis (DBT), a quasi-3D imaging modality, has been shown to improve lesion detection and diagnostic performance compared with conventional digital mammography (2, 3). Large screening studies have also demonstrated that DBT reduces recall rates, particularly in women with dense breast tissue (4-6). Tissue-based biomarker assessment may be affected by sampling variability and limited representation of tumor heterogeneity, while imaging-based approaches may offer complementary information to support pathological evaluation (7-9).
Beyond its role in early detection, the full potential of DBT may be realized when combined with artificial intelligence (AI) and deep learning (DL) approaches. Recent evidence suggests that convolutional neural networks (CNNs) applied to DBT images can enhance lesion characterization, support cancer detection, and deliver reproducible diagnostic outputs (7). These findings suggest that DBT-based imaging tools may be clinically useful beyond diagnostic applications by providing imaging-derived information relevant to prognostic assessment.
In routine oncology practice biomarkers, such as estrogen receptor (ER), progesterone receptor (PR), human epithelial growth factor receptor 2 (HER2), and Ki-67 play a central role in risk stratification, therapeutic decision-making, and individualized treatment planning. These markers are typically assessed through immunohistochemistry (IHC), guiding the use of endocrine therapies, HER2-targeted agents, and chemotherapy (10). Incorporating molecular profiling into imaging analysis will provide an opportunity to combine diagnostic imaging with precision oncology, supporting more informed and individualized treatment strategies.
To the best of our knowledge, this study is among the first to combine DBT slice data with CNNs for imaging-derived prediction of multiple breast cancer biomarkers, representing a novel step toward molecular profiling and precision oncology.
Materials and Methods
Dataset and Study Design
This retrospective, single-center study was conducted at University of Health Sciences Türkiye, Başakşehir Çam and Sakura City Hospital with approval from the local ethics committee (approval number: KAEK/2023.04.134, date: 05.04.2023). The cohort included anonymized female patients with histopathologically-confirmed breast cancer, diagnosed between January 2020 and December 2021. DBT scans were acquired using the Giotto Class 3000 DBT System (IMS Giotto S.p.A., Via Toscanini 56, 40055 Budrio, Bologna, Italy) in the Mediolateral Oblique (MLO) view, with each volume consisting of 60 to 80 slices. Due to the retrospective nature of the study and the use of anonymized data, the requirement for informed consent was waived by the local ethics committee.
Inclusion criteria were:
• Histopathological confirmation of breast cancer prior to treatment
• Availability of complete biomarker data based on IHC analysis (ER, PR, HER2, Ki-67)
• Lesions clearly visible on DBT
Exclusion criteria were:
• T0-stage tumors or ductal carcinoma in situ
• Lesions not detectable on DBT
• DBT images of inadequate quality for analysis
• Incomplete histopathology or imaging data
The dataset was divided into training (70%) and testing (30%) sets using stratified sampling to preserve the distribution of biomarker subtypes.
Pathological Evaluation Criteria
The following prognostic biomarkers were evaluated (10-12):
• ER: considered positive when ≥1% of tumor cell nuclei exhibited IHC staining.
• PR: considered positive when ≥1% of tumor cell nuclei stained with IHC.
• HER2: defined as positive when IHC scored 3+ or when gene amplification was confirmed by fluorescence in situ hybridization, as per American Society of Clinical Oncology/College of American Pathologists guidelines (11).
• A Ki-67 cut-off value of ≥20% was used in accordance with widely accepted clinical guidelines (10).
• Triple-negative breast cancer (TNBC): defined as the absence of ER, PR, and HER2 expression.
For patients with multifocal or multicentric disease, lesion selection was performed by a breast radiologist with 10 years of experience according to a predefined protocol. The largest lesion was preferentially selected for analysis; however, if it was not clearly visualized on DBT, the lesion demonstrating the highest morphological conspicuity was chosen as the representative lesion.
DBT Image Preprocessing
DBT scans were acquired in right mediolateral oblique and left mediolateral oblique views. For each patient, the affected breast was selected for analysis. Each DBT study consisted of 60–80 slices stored as DICOM files. The representative DBT slice was selected by a breast radiologist with 10 years of experience according to a predefined protocol that prioritized slices demonstrating clear lesion visualization, well-defined lesion margins, and maximal lesion conspicuity within the DBT stack. To ensure compatibility with DL workflows, images were converted from DICOM to PNG format while preserving the original spatial resolution and pixel dimensions. Histogram equalization was then applied to improve contrast and emphasize structural differences between dense and fatty tissues. This preprocessing step enhanced visualization of lesion margins and improved overall parenchymal contrast (Figure 1). Histogram equalization was applied uniformly to all images across both training and testing datasets.
The preprocessing pipeline included:
• Grayscale conversion: reduced input complexity by representing each pixel with a single channel.
• Histogram equalization: increased contrast to better highlight tumor structures.
The normalized image histogram was defined as follows: let a digital image be represented by f, with pixel intensity values ranging from 0 to L-1L-1L-1. For grayscale images, L = 256L = 256L = 256, representing the total number of possible intensity levels. The normalized histogram is given by Equation (1), and contrast enhancement was achieved using the transformation function in Equation (2), which redistributes intensity values across the image, particularly improving visibility in low-contrast regions (13-17).
Equation (1): pf(fĸ) = nĸ / n
Equation (2): T(fk)=(L−1) ×j=0∑kpf(fj)
These preprocessing steps enhanced both anatomical and pathological feature visibility, which is critical for extracting meaningful patterns during DL model training. Images were then manually labeled according to five biomarker categories: ER status, PR status, HER2 status, TN molecular subgroup, and Ki-67 proliferation index.
Data Augmentation Techniques
To reduce the risk of overfitting and enhance the model’s ability to generalize to unseen data, several data augmentation strategies were applied to the training set. These techniques introduced controlled variability while maintaining the integrity of lesion morphology.
The following augmentations were performed:
• Random rotations (±15 °)
• Horizontal and vertical translations (up to 10%)
• Zooming in or out (±10%)
• Horizontal flipping
Deep Learning Architecture and Visualization
The most representative slice showing the tumor was selected from the DBT stack. This deliberate choice mirrors histopathology practice, where biomarker assessment is typically performed on the most informative or aggressive field. Selecting the slice with maximal lesion conspicuity reduced background noise, highlighted biologically relevant features, and ensured computational feasibility for this proof-of-concept study. Histogram equalization was then applied to enhance lesion contrast and improve boundary definition, particularly in dense breast tissue. Histogram equalization was applied to enhance lesion contrast and improve boundary definition, particularly in dense breast tissue. The preprocessed, histogram-equalized slices were subsequently used to train and evaluate two CNN architectures.
The two machine learning AI applications used were the VGG19 and ResNet50 models. These architectures were selected for their established effectiveness in medical image analysis and their ability to extract hierarchical features from high-resolution radiological data. Both VGG19 and ResNet50 architectures, illustrated in Figure 2, are composed of multiple stacked convolutional layers, max-pooling operations, and fully connected layers.
The VGG19 model (18, 19) network consisted of 16 convolutional layers followed by three fully connected layers. It employed ReLU activation, batch normalization, dropout (rate: 0.3), and max-pooling operations throughout the architecture. For binary classification, a final sigmoid activation function was applied. The model was optimized using the Adam algorithm (learning rate = 0.0001) with binary cross-entropy loss and was trained for 10 epochs with a batch size of 32 in TensorFlow 2.13.
The ResNet50 model (18, 19) is a modified ResNet50 pretrained on ImageNet and was used as the backbone for feature extraction. The original classification head was replaced with a GlobalMaxPooling2D layer, followed by fully connected dense layers with ReLU activations and a final sigmoid output. Residual blocks were preserved to maintain gradient flow and support deeper network training.
Statistical Analysis
To evaluate the robustness and generalizability of the classification models, a comprehensive statistical analysis was performed. For binary classification tasks, performance was quantified using sensitivity, specificity, precision, negative predictive value, false negative rate, false discovery rate, overall accuracy, F1 score, and Matthews correlation coefficient (MCC).
Probabilistic model performance was assessed using the area under the receiver operator characteristic (ROC) area under the curve (AUC) and log loss. Comparisons between models were conducted using DeLong’s test to determine statistical differences in AUC, with significance defined as p<0.05.
To estimate uncertainty in model performance, 95% confidence intervals (CIs) for AUC values were calculated through stratified bootstrap resampling with 1,000 iterations. All statistical analyses were conducted in Python 3.11.4, using Scikit-learn, TensorFlow, NumPy, and associated packages for ROC curve analysis and bootstrap-based CI estimation (20, 21).
Results
This retrospective study included 43 patients diagnosed with breast cancer. From their DBT acquisitions, a total of 960 lesion-focused slices were selected. The dataset was split into 70% training and 30% testing cohorts, ensuring balanced distributions of prognostic markers (no statistically significant differences, p>0.05 by chi-square test).
Hormonal Receptor (ER, PR) Status
Model performance varied notably between ER and PR receptor classification tasks. Both VGG19 and ResNet50 demonstrated similar effectiveness for ER detection, achieving an AUC of 0.81 (95% CI: 0.78–0.84), with identical F1 scores (0.81) and MCC values (0.6194). No statistically significant difference was found between the models (p = 0.83), indicating consistent performance across architectures.
In contrast, PR classification revealed substantial discrepancies. VGG19 achieved an AUC of 0.76 (95% CI: 0.71–0.80), with an F1 score of 0.76 and MCC of 0.52, indicating moderate predictive strength. However, ResNet50 failed to generalize effectively for PR status, showing an AUC of only 0.24, suggesting performance below random chance. This significant performance gap (p<0.001 vs. VGG19) may indicate overfitting, architecture mismatch, or a lack of sensitivity to PR-specific morphological features in ResNet50. Among luminal tumors, there were no ER-positive/PR-negative cases in our cohort, which precluded meaningful subgroup discrimination and resulted in marked class imbalance. PR status was not evaluated as an independent primary predictive target in this study.
These results are summarized numerically in Table 1 and visually represented in Figure 3, which compares classification metrics (accuracy, AUC, F1 score, MCC) across both models for ER and PR tasks.
HER2 Receptor Expression
HER2 classification using the ResNet50 model resulted in moderate predictive performance, with an AUC of 0.74 (95% CI: 0.70–0.78), an F1 score of 0.74, and an MCC of 0.48. Comparative ROC analysis revealed that the ability of the model to discriminate HER2 status was significantly inferior to ER classification (p = 0.047). This reduced performance may stem from the heterogeneous imaging phenotypes characteristic of HER2-positive tumors, which often exhibit variable morphological and biological features. These results reflect the complexity of modeling HER2 expression using imaging data and underscore the challenges of non-invasive biomarker prediction in biologically diverse tumor subtypes.
Triple-Negative Subgroup
Classification of TNBC showed the highest predictive performance across all biomarker categories. Both CNN architectures (VGG19 and ResNet50) achieved an AUC of 0.92 (95% CI: 0.90–0.94), an F1 score of 0.92, and an MCC of 0.85. The narrow confidence intervals obtained through bootstrap analysis further confirmed the consistency and robustness of model performance. Pairwise ROC comparisons demonstrated that identification of TNBC by both CNN architectures significantly outperformed their performance for HER2, PR, and Ki-67 predictions (p<0.001 for all comparisons). These findings highlight the potential of DBT-based DL tools in accurately identifying TNBC. Performance metrics are provided in Table 2, and model trends are illustrated in Figure 4.
Ki-67 Proliferation Index
The predictive modeling of Ki-67 expression yielded moderate classification success. The VGG19 architecture slightly outperformed ResNet50, with accuracy scores of 68.9% and 68.5%, respectively. VGG19 produced an AUC of 0.70 (95% CI: 0.64–0.75), with balanced F1 and MCC values, suggesting improved generalizability. While these metrics were not high, the AUC values remained statistically superior to random guessing (p<0.01). These findings imply that while CNNs may capture certain proliferation-associated patterns from DBT images, further model refinement or multimodal approaches may be needed to improve the accuracy of non-invasive Ki-67 prediction.
Discussion and Conclusion
This study demonstrated the feasibility of imaging-based prediction of key prognostic biomarkers in breast cancer, ER, PR, HER2, Ki-67, and TNBC, using DBT images analyzed with two CNNs, VGG19 and ResNet50 architectures (11, 22). These models were capable of extracting morphological features from DBT images that reflected some of the underlying tumor biology.
Among the evaluated biomarkers, TNBC yielded the highest predictive performance (AUC: 0.92, F1-score: 0.93), followed by ER status (AUC: 0.81). Predictions for HER2 and Ki-67 achieved moderate accuracy, while substantial discordance was observed in PR classification between the two CNNs: VGG19 achieved 76% accuracy, whereas ResNet50 reached only 24%. This poor performance by ResNet50 may stem from insensitivity to fine texture-based features, class imbalance, or overfitting (20). Comparable findings have been reported in the literature using other imaging modalities. For instance, Dominique et al. (13) used images from contrast-enhanced spectral mammography (CEM) to predict ER and TN status with AUCs of 0.85 and 0.91, respectively. In contrast, the current study achieved comparable results without the use of contrast agents, offering a potential advantage in patients with contraindications for contrast administration.
Prior DBT-based studies have primarily focused on malignancy detection and lesion classification. Bevilacqua et al. (22) demonstrated the superiority of deep CNNs over shallow models, and El-Shazli et al. (23) proposed a 3D DBT-based DL diagnostic pipeline. Our study contributes to this body of work by extending the use of DL beyond binary lesion classification to derive biologically meaningful output that may contribute to personalized treatment planning.
Importantly, our study adopted a 2D slice-based approach rather than volumetric analysis. This deliberate choice was motivated by computational feasibility and the premise that a carefully selected, representative slice may sufficiently capture the biological aggressiveness of the tumor. In histopathology practice, biomarker evaluation is often performed on the most representative or biologically aggressive portion of the specimen (10), reflecting the area most relevant to clinical behavior. By analogy, selecting DBT slices with maximal lesion conspicuity may highlight biologically meaningful features while reducing background noise. Similar single-slice or representative-field strategies have been successfully applied in prior radiomics and AI studies in breast imaging and neuro-oncology, further supporting the validity of this approach. Nevertheless, future work should explore multi-slice fusion or 3D CNNs, which may provide incremental value by incorporating spatial context and capturing intratumoral heterogeneity (21-26). Consistent with this perspective, Zhang et al. (27) demonstrated that 2D CNNs applied to selected slices from 3D DBT volumes may achieve reliable classification performance, supporting the feasibility of slice-based DL strategies when volumetric analysis is computationally constrained.
The diagnostic potential of DBT has also been reported by Ricciardi et al. (28), who demonstrated high accuracy for malignancy detection, and by Shimokawa et al. (29), who used CNNs to predict stromal invasion based on DBT images. Collectively, these studies suggest that DBT captures rich morphological information extending beyond lesion presence or invasion status. In this context, the present study further broadens the scope of DBT by leveraging its morphological capacity for biological and molecular profiling. The high prediction accuracy observed for TNBC and ER status may be attributed to their distinct imaging characteristics. For instance, TNBC frequently appear as high-density masses with ill-defined margins on DBT, which may be readily captured by CNN models (3, 5). In contrast, the moderate performance for HER2 and Ki-67 may reflect their heterogeneous imaging appearance or features that fall below the spatial resolution of DBT. A meta-analysis by Yoon et al. (7) supported the use of DBT over digital mammography in AI applications, highlighting pooled AUCs up to 0.91 and improved performance with deeper architectures, such as ResNet and VGG (18). These findings are in line with our results, particularly the superior performance of VGG19 in PR and Ki-67 prediction. Prediction of the Ki-67 proliferation index yielded moderate success in this study (AUC = 0.70, 95% CI: 0.64–0.75), with balanced F1 and MCC scores indicating reasonable generalizability. These findings are consistent with previous literature. Dominique et al. (13) reported an AUC of approximately 0.72 for Ki-67 prediction in CEM images using DL models. Despite using contrast-based imaging, their results suggest that Ki-67 remains a challenging biomarker to predict via imaging modalities alone. Unlike biomarkers such as ER or TNBC status, Ki-67 does not consistently produce distinct radiological patterns, particularly in DBT, which may explain the moderate performance observed. Nevertheless, the AUC was statistically better than chance, supporting the hypothesis that certain morphological patterns associated with cellular proliferation are partially captured in DBT images and can be recognized by deep CNNs (13, 22, 23).
From a clinical perspective, imaging-based biomarker prediction aligns with the evolving concept of a “virtual biopsy”. Such approaches may assist in biopsy targeting, inform therapeutic decision-making in scenarios with limited tissue availability, and enable non-invasive longitudinal monitoring (3, 7). Moreover, recent studies have provided complementary evidence supporting our DBT-CNNs framework for biologically meaningful prediction. do Nascimento et al. (30) identified members of the Pleckstrin Homology-Like Domain, Family B as prognostic and predictive biomarkers, highlighting the value of molecular signatures for treatment stratification. Mundinger and Mundinger (31) demonstrated that AI-assisted image analysis may reduce workload and recall rates in breast screening, reinforcing the practicality of integrating AI with DBT to generate clinically actionable outputs, consistent with our CNN-based approach for deriving molecular surrogates.
Collectively, these studies support the premise that imaging-derived biomarkers may be capable of extending precision oncology beyond histopathology and may serve as the foundation for future virtual-biopsy-driven clinical workflows. Future investigations should explore volumetric approaches, such as 3D CNNs or multi-slice fusion, that might better capture intratumoral heterogeneity and subtle architectural cues associated with proliferative activity, including Ki-67 expression. Moreover, integrating DBT with complementary imaging modalities, such as MRI or ultrasound, may further enhance predictive performance through multi-modal DL frameworks.
Study Limitations
This study has certain limitations that merit acknowledgment. Its retrospective, single-center design and relatively small sample size may affect the generalizability of the findings. The deliberate use of a single representative DBT slice per patient focused our analysis on the most morphologically informative regions and avoided the confounding impact of excessive heterogeneity or poorly defined areas. Supporting this strategy, Hossain et al. (32) demonstrated that automatic region of interest detection in histopathological imaging can significantly enhance accuracy by concentrating computational focus on relevant areas and reducing noise (33). Given the relatively small sample size, this study should be regarded as exploratory and proof-of-concept in nature. Although data augmentation was applied, the risk of model overfitting cannot be fully excluded, and the reported performance metrics should be interpreted with caution. Future ablation studies comparing histogram equalization, contrast-limited adaptive histogram equalization, and raw images are warranted.
This study demonstrated that imaging-based prediction of key prognostic biomarkers in breast cancer is feasible using CNNs (VGG19 and ResNet50) applied to selected DBT images. High accuracy in prediction was achieved for ER status and the TNBC type, and moderate precision was observed for HER2 and Ki-67 index. Notably, PR prediction substantially varied between CNN models. These findings suggest that DBT combined with DL may provide complementary imaging-based information to support clinical decision-making, rather than replacing tissue-based profiling. Clinical translation and integration into decision-support workflows will require confirmation in large-scale, multi-center, prospective studies.
Clinical Relevance Statement
The integration of selected DBT images with DL enabled the prediction of key histopathological biomarkers—such as ER, PR, HER2, TNBC and the Ki-67 index, to varying degrees of accuracy, which may be integrated into clinically guided workflows. Imaging-based biomarker prediction has the potential to improve biopsy targeting, guide therapy selection, and support treatment planning, particularly in cases with limited or unavailable tissue although more evidence is required before widespread acceptance. External validation in larger, multi-institutional cohorts is warranted, we believe, and our findings support the growing role of DBT image analysis using AI models to accelerate individualized care pathways and advance precision oncology in breast cancer.


