ABSTRACT
Objective:
Breast cancer clinical stage and nodal status are the most clinically significant drivers of patient management, in combination with other pathological biomarkers, such as estrogen receptor (ER), progesterone receptor or human epidermal growth factor receptor 2 (HER2) receptor status and tumor grade. Accurate prediction of such parameters can help avoid unnecessary intervention, including unnecessary surgery. The objective was to investigate the role of magnetic resonance imaging (MRI) radiomics for yielding virtual prognostic biomarkers (ER, HER2 expression, tumor grade, molecular subtype, and T-stage).
Materials and Methods:
Patients with primary invasive breast cancer who underwent dynamic contrast-enhanced (DCE) breast MRI between July 2013 and July 2016 in a single center were retrospectively reviewed. Age, N-stage, grade, ER and HER2 status, and Ki-67 (%) were recorded. DCE images were segmented and Haralick texture features were extracted. The Bootstrap Lasso feature selection method was used to select a small subset of optimal texture features. Classification of the performance of the final model was assessed with the area under the receiver operating characteristic curve (AUC).
Results:
Median age of patients (n = 209) was 49 (21–79) years. Sensitivity, specificity, positive predictive value, negative predictive value and accuracy of the model for differentiating N0 vs N1-N3 was: 71%, 79%, 76%, 74%, 75% [AUC = 0.78 (95% confidence interval (CI) 0.72–0.85)], N0-N1 vs N2–N3 was 81%, 59%, 24%, 95%, 62% [AUC = 0.74 (95% CI 0.63–0.85)], distinguishing HER2(+) from HER2(-) was 79%, 48%, 34%, 87%, 56% [AUC = 0.64 (95% CI 0.54–0.73)], high nuclear grade (grade 2–3) vs low grade (grades 1) was 56%, 88%, 96%, 29%, 61% [AUC = 0.71 (95% CI 0.63–0.80)]; and for ER (+) vs ER(-) status the [AUC=0.67 (95% CI 0.59–0.76)]. Radiomics performance in distinguishing triple-negative vs other molecular subtypes was [0.60 (95% CI 0.49–0.71)], and Luminal A [0.66 (95% CI 0.56–0.76)].
Conclusion:
Quantitative radiomics using MRI contrast texture shows promise in identifying aggressive high grade, node positive triple negative breast cancer, and correlated well with higher nuclear grades, higher T-stages, and N-positive stages.
Key Point
• The precision of the presented radiomics model is 75% when distinguishing between N0 and N1-N3 cases, and 62% for differentiating between N0-N1 and N2-N3 cases. Furthermore, the model achieved an area under the curve of 71% when identifying high nuclear grade (grades 2–3) versus low grade (grade 1) cases.
Introduction
Breast cancer is the most commonly diagnosed cancer and leading cause of cancer deaths among women (1). Multiple factors impact prognosis, including patient age, tumor size, type and grade, and lymph node status (2, 3, 4, 5). In recent years, estrogen receptor (ER), progesterone receptor (PR) and human endothelial growth factor receptor 2 (HER2) status have emerged as important molecular biomarkers in staging breast cancer and guiding treatment decisions regarding hormonal and targeted therapies, neoadjuvant chemotherapy, or upfront surgery (2, 6). Triple negative breast cancer (TNBC) is associated with poor prognosis and decreased survival (7, 8), while targeted therapies in receptor positive breast cancer improve outcomes (9, 10). The status of ER, PR and HER2 is determined by immunohistochemistry analysis of individual biopsy samples via well-established protocols (11, 12). However, due to intra-tumoral heterogeneity within the primary lesion and inter-tumoral heterogeneity between the primary cancer and its metastases, incisional biopsy results may not be representative of the whole tumor (13, 14). A non-invasive method for evaluating tumor biomarkers may be useful for detecting heterogeneity and assist as a clinical decision support tool throughout the continuum of patient care from detection to adjuvant therapies.
Metastatic status of axillary lymph nodes is an important prognostic marker, guiding therapy in newly diagnosed breast cancer (6). Sentinel lymph node biopsy (SLNB) is the mainstay method for evaluating axillary lymph node metastasis but SLNB is invasive and associated with morbidity (15). Implementation of a preoperative assessment of axillary lymph nodes with imaging may help avoid SLNB in some cases (15, 16, 17, 18, 19, 20). Physical examination, mammography, breast ultrason and fine needle aspiration biopsy provide limited sensitivity and specificity in assessment of axillary lymph nodes and cannot reliably exclude the need for SLNB (15, 17, 18, 19, 20, 21, 22). Dynamic contrast-enhanced (DCE)-magnetic resonance imaging (MRI) has been reported to be the most accurate method of evaluating disease extent (23, 24, 25). DCE-MRI also allows assessment of the axillary and internal mammary nodes for metastatic disease. However, it also has modest sensitivity and negative predictive value (NPV) at 80% and 60%, respectively, for detection of axillary lymph node metastasis (15, 20, 25, 26).
Textural kinetics are quantitative imaging features that describe the dynamic variation of textural features of breast lesions during contrast material uptake and can outperform standard morphologic, static texture, and kinetic intensity features in the differentiation of benign and malignant lesions (27). Textural heterogeneity on MRI correlates with histopathological tumor heterogeneity and shows a positive trend for correlation with prognostic markers such as ER, PR or HER2 positivity, and prognostic scores such as Oncotype Dx or PAM50 (28, 29). Several studies have shown that MRI imaging features are associated with molecular breast cancer subtypes (28, 29, 30, 31, 32). The results of these studies offer a possible framework in which to explore textural features as biomarkers of clinically relevant prognostic indicators.
In this study, we aimed to investigate the potential role of DCE-MRI texture radiomics for identification of virtual prognostic biomarkers for ER, PR and HER2 expression, tumor grade, molecular subtype, clinical T and N stage.
Materials and Methods
In this institutional review board-approved and HIPAA-compliant (Health Insurance Portability and Accountability Act) study, consecutive patients with primary invasive breast cancer who underwent breast DCE-MRI between July 2013 and July 2016 in our institution were retrospectively reviewed. Age, tumor size (T1–4), regional nodal metastasis (cN1, cN2, cN3) and tumor stage (I-IV) information was collected from electronic medical records. Treatments that each patient received, response to treatment (pathological complete response vs partial response vs stable disease) and residual cancer burden (I-III) were collected. Tumor grade (grade 1, 2 or 3), ER status (ER positive or ER negative), HER2 status (HER2 amplified vs HER2 non-amplified) and Ki-67 (low, intermediate, or high) status was obtained from pathology reports. American Society of Clinical Oncology/College of American Pathologists criteria were followed in the assessment of ER, PR, HER2 and Ki-67 positivity. Molecular subtypes were defined, based on previously published criteria: Luminal A (ER+ and/or PR+, Ki-67<14%), Luminal B (LuminalB-HER2-: ER+ and/or PR+, Ki-67≥14%; LuminalB-HER2+: ER+ and HER2+ regardless of Ki-67), HER2+ (ER and PR-, HER2+), and TNBC (TN, or ER-, PR-, HER2-). The significance of PR expression in the absence of ER expression in tumors is unclear as ER+ ty dominates tumor biology and prognosis. In addition, ER expression is the predominant determinant of tumor molecular subtype per The American Joint Committee on Cancer (AJCC) 8th edition classification of tumor subtype and stage (AJCC 8th ed) (33).
Results
Discussion and Conclusion
These results show that quantitative radiomic models can be helpful in excluding clinically aggressive disease and in predicting tumor stage and grade, which can potentially help with clinical management decisions. N-stage is one of the most important markers to be able to predict, as reliable pre-operative image-based prediction of N-stage can help to avoid SLNB, an invasive procedure. A positive nodal status will also change the management significantly, indicating an axillary lymph node dissection. An additional clinical scenario may be the use of this technology as a “tie-breaker” in the setting of high surgical risk (co-morbidities, age, etc.). Our model achieved 81% sensitivity and 95% NPV in predicting advanced nodal stage (N0–1 vs. N2–3) of breast cancer. Determining nodal status requires dedicated imaging and needle guided biopsy which incurs extra cost and procedure-related morbidity for the patient. Our model shows promise as a practical clinical decision support tool. Using our model, 67/95 (71%) of the clinically node-positive (N1–3) patients could be accurately diagnosed without the need for additional imaging or biopsy.
Clinically aggressive disease (grade 2–3+ Triple-negative+ N1–3) was detected with 100% sensitivity and 100% NPV with our model (AUC = 0.724). High sensitivity (78%), specificity (73%) and NPV (96%) were also achieved in predicting high-grade HER2+ breast cancer with nodal metastasis (grade 2–3+HER2+, and N1–3). Our results are promising and can be further developed as a reliable pre-operative decision support tool, which may help guide management decisions at initial diagnosis and throughout the treatment continuum.
Our model showed good performance in predicting T-stage (AUC = 0.789) and grade (AUC = 0.709) of breast tumors. For instance, high specificity and PPV (88% and 96% for grade and 90% and 75% for T-stage) were demonstrated in differentiating low (grade 1) vs high-grade (grade 2–3) breast cancers and T-stage. In combination with nodal stage, these results can further contribute to the preoperative management decision making, particularly chemotherapy versus upfront surgery.
Evaluation of molecular marker expression in breast cancer with MRI texture analysis may allow monitoring of changes in biomarker expression over time or after interventions, such as neoadjuvant chemotherapy, as well as resolving the problems related to tumor heterogeneity. Our model had moderate performance in detecting ER (AUC = 0.670), HER2 (AUC = 0.636) and Ki-67 (AUC = 0.589) expression in breast cancer and detecting molecular subtypes (AUC = 0.658 for luminal A) but was not a significant predictor for Luminal B, HER2+ or TNBC, and needs improvement on this aspect.
In the landscape of current research, several studies have yielded results akin to ours through the application of machine learning techniques across larger cohorts of patients (34). However, a common limitation among these studies is their exclusive focus on early-stage cancers, which narrows their applicability in the diverse spectrum of real-world clinical settings. Furthermore, while some research has successfully predicted the presence of specific biomarkers, such as HER2, or concentrated on singular molecular subtypes, like TNBC, these approaches do not fully encompass the complexity of breast cancer diagnosis and treatment (35, 36). In contrast, the present study is unique in presenting a model that mirrors the intricacies of actual clinical practice. It achieves this by incorporating a comprehensive range of molecular subtypes, spanning all cancer stages, and considering a wide array of significant biomarkers. This holistic approach not only enhances the model’s relevance but also significantly broadens its utility in clinical decision-making, offering a more nuanced tool for healthcare professionals navigating the multifaceted landscape of breast cancer treatment. The retrospective design of the study was one of the limitations. Including MRI images from different vendors may improve the real-life application of the model. As a drawback, images were analyzed in 2D. Volumetric texture parameters with a 3D model may result in better performance. In addition to conventional statistical methods, such as cluster analysis, novel machine or deep learning can be used to train a model for further improvement. Including more demographic parameters, such as patient age and history, and radiologic parameters such as lesion size, may increase the sensitivity and performance of the model. The proposed cut-off was purely based on sensitivity and specificity without considering changes in prevalence in different population or analyzing cost to specific population or institutions as part of the clinical management process. More comprehensive decision analysis into operating this model would take account of the cost (either financially or in terms of population level welfare such as quality adjusted life-year) is necessary.
In addition, the most common indications for breast MRI are suspected multifocal/centric disease, size discrepancy between clinical exam and imaging, or between mammography and ultrasound; ER-negative disease or larger ER+ cancers with anticipated pre-operative systemic therapy, suspected anterior chest wall/nipple involvement and cancers identified in high-risk screening populations at supplemental screening. Due to these indications, there is a possibility that the cancers reported in our series are biased toward advanced disease or those patients who are likely to get neoadjuvant therapy. While we acknowledge this bias, we believe our series is representative of cancers imaged with MRI nationally, and hence from whom texture features can be extracted.
Our findings support earlier studies, which have reported correlation between breast cancer TNM stage and MRI imaging characteristics with similar ROC values and with the advantage of larger patient samples. The results of the present study indicate that whole tumor MRI texture analysis shows promise as a potential tool that can be integrated into clinical decision-making, in conjunction with histopathological markers, to distinguish low risk disease with high NPV.
DCE-MRI Technique
All MRI studies were performed with the patients lying prone in a 1.5 T scanner (OptimaTM MR450w; GE Healthcare, Milwaukee, WI, USA) using a dedicated 8–channel breast array coil (MRI Devices Corporation, Pewaukee, WI, USA). A single pre-contrast and four serial bilateral dynamic VIBRANT sagittal image sets, which were obtained before and immediately after rapid intravenous bolus infusion of 0.1 mmol/kg gadopentetate dimeglumine contrast medium (Magnevist; Bayer HealthCare Pharmaceuticals Inc., Wayne, NJ, USA) at a rate of 3 mL/s with a power injector (Spectris Solaris MR Injector; MEDRAD, Warrendale, PA, USA), with an average dynamic temporal resolution of 90 s/phase (range 60-120 s, depending on patient size and full bilateral breast coverage), TR/TE 5.59–7.2/1.7–18 ms, field of view 18–26 cm, matrix 256 × 256, FA 10, and slice thickness/gap 1.8/0.9 mm.
Image Processing and Extraction of Texture Features
All MRI images were loaded into Horos with OsiriX plugin (Pixmeo SARL, Geneva, Switzerland), on a secured dedicated research computer. The series was de-identified using the RSNA Clinical Trial Processor (34) and stored in a research PACS (iPACS, Invicro, Boston, MA, USA). Lesions were segmented using regions-of-interest (ROIs). When multiple cancers were present, the index lesion, which was used to clinically stage the patient, was used. ROIs were drawn manually by a breast imaging fellow with 1 year of experience in MRI imaging and interpretation, supervised by a fellowship-trained breast imager with 16 years of MRI imaging experience to indicate the lesion of interest. When possible, the ROIs were centered in each slice on areas of contrast uptake with no visible necrotic areas. Necrotic areas were excluded from the texture analysis, as only metabolically active regions of tumor are of interest in comparing prognostic subtypes. The ROI size was chosen individually to balance the need for sufficient voxel statistics and maximum lesion coverage. The stack of ROIs was also used to generate morphological measures of the lesion. Haralick texture features were extracted using MATLAB (2015, version 8.5, R2015a, The MathWorks Inc., Natick, MA, USA). For Haralick texture features, distance was set at 1 pixel and features were averaged across all angles under the isotropic assumption.
Statistical Analysis
All features were grouped with an unsupervised Principal Component analysis (PCA)-like procedure. Similar features were grouped into disjoint clusters with a linear combination (corresponding to first principal component). The relationship between lesion and patient characteristics were investigated by Pearson correlation test and correlation is shown as a heat map (Figure 1).
A soft version of the Bootstrap Lasso (Bolasso) feature selection method was used. Specifically, 500 replicates of the data with simple random sample with replacement was generated. In each replicate, features were selected using Lasso with regularization parameter rho = 0.8. The importance of features was evaluated by the selection frequencies over the bootstrap samples. The final selected model consisted of features that were present in at least 80% of the bootstrap replications and was evaluated by a receiver operator curve (ROC) analysis under leave-one-out (LOO) cross validation. A cut-off on the ROC curve was proposed by maximizing the Youden index. Corresponding accuracy, sensitivity, specificity, positive predictive value (PPV) and NPV were calculated with 95% confidence intervals.
ROC analysis was used to compare associations between the cluster components and clinical outcomes adjusted for age and race. Area under the curve (AUC) values were calculated with LOO cross validation. Diagnostic performance of the cut-off was calculated. Statistical software used was SAS, version 9.4 (SAS Institute Inc., Cary, NC, USA). The significance level was 0.05 and Bonferroni correction was used for multiple comparisons, when necessary.
Patient and Lesion Characteristics
Two hundred and eight patients with breast cancer underwent breast MRI and are included in the study. Median (range) patient age was 49.8 (21–79) years. Median T, N and M stages of the lesions were T2, N0 and M0 respectively (46.6%, 54.3% and 83.7%). Mean Ki-67 expression was 42.2%. Further patient (race, age) and lesion [grade, classification of malignant tumors (TNM) stage, Ki-67 expression, ER and HER2 status] characteristics are summarized in Table 1.
Texture Parameter Clustering
Texture parameters that had highest correlation with prognostic factors, determined by Pearson correlation test, were grouped under three main clusters. Each cluster included the following parameters:
• Cluster 1 Total, correlation, sum of entropy, entropy.
• Cluster 2 Angular second moment, correlation difference variance, difference entropy and information measure of correlation 2.
• Cluster 3 Maximum, minimum, standard deviation, mean, contrast, sum of squares, inverse difference moment, sum average and sum variance.
Further correlations between the parameters are shown as a heat map in Figure 1.
Prediction of Tumor Grade and Stage
The Cluster 1 model showed the highest performance in predicting tumor grade, clinical nodal stage, and T stage of breast tumors (AUC = 0.709, 0.782 and 0.789 respectively). T stage of the tumors [T1–T2 (n = 138) vs T3–T4 (n = 70)] is predicted with 58% sensitivity, 90% specificity, 75% PPV, 80% NPV and 79% accuracy with this model [AUC = 0.789, 95% confidence interval (CI) 0.718–0.860]. Moderate-to-high sensitivity (71%), specificity (79%), PPV (76%), NPV (74%) and accuracy (75%) was observed in predicting the presence of clinically evident regional lymph node metastasis on the optimal cut-off point of the Cluster 1 model [cN0 (n = 113) vs cN1–3 (n = 95)] (AUC = 0.782, 95% CI 0.715–0.850). High-grade tumors (grade 2 or 3, n = 171) can be detected with the Cluster 1 model with high specificity (88%) and PPV (96%), but sensitivity (56%), NPV (29%) and accuracy (61%) were moderate, at best (AUC = 0.709, 95% CI 0.626–0.792) at the optimal cut-off points (Figure 2). Nodal metastasis (N0 vs N1 3) was predicted with 71% sensitivity, 79% specificity, 76% PPV and 74% NPV and 75% accuracy (AUC = 0.782, 95% CI 0.715–0.849). Higher sensitivity (81%) and NPV (95%) can be achieved for N0–N1 vs N2–N3 (AUC = 0.739, 95% CI 0.632–0848) (Figure 3).
Prediction of Molecular Biomarker Expression and Molecular Subtype
The Cluster 1 model also had the best performance in detecting ER, HER2 and Ki-67 expressions of breast tumors (AUC = 0.670, 0.636 and 0.589, respectively), compared to the Cluster 2 and 3 models. In predicting ER positive disease (n = 150), the model had 67% sensitivity, 67% specificity, 85% PPV, 43% NPV and 67% accuracy (AUC = 0.670, 95% CI 0.585–0.755). HER2 positivity (n = 50) in the tumor can be detected with moderate-to-high sensitivity (79%) and NPV (87%), and moderate-to-low specificity (48%), PPV (34%) and accuracy (56%) (AUC = 0.636, 95% CI 0.523–0.729). However, the Cluster model 1 was not a significant predictor for Ki-67 expression (n = 73) (<14% vs. >14%) in breast cancer, with low sensitivity (54%) and specificity (68%) (AUC = 0.589, 95% CI 0.486–0.692).
Cluster 1 had 74% sensitivity, 63% specificity and 94% NPV for distinguishing Luminal A tumors (n = 31) from other molecular subtypes (AUC = 0.658, 95% CI 0.556–0.759), whereas it was not a significant predictor for Luminal B (n = 119), TNBC (n = 40) or HER2+ (n = 18) molecular subtypes.
Prediction of Tumor Aggression
The Cluster 1 model showed the best performance in detecting late-stage, aggressive breast cancer (grade 2–3+T3-4+HER2+/Triple negative vs grade 1, T1–2, Luminal A or B) (AUC = 0.820 and 0.724 respectively). In detecting high grade, HER2 positive disease with lymph node metastases (grade 2–3+HER2+, and N1–3) it showed 78% sensitivity, 74% specificity, 94% NPV and 74% accuracy (AUC = 0.820 95% CI 0.728–0.913). In distinguishing high-grade TNBC with nodal metastases (biologically aggressive) from other subtypes, the Cluster 1 model had 100% sensitivity and NPV, with moderate-to-low specificity (42%), PPV (11%) and accuracy (46%) (Figure 4).
Diagnostic performance of Cluster 1 model in predicting various prognostic parameters at the selected cut-off points is further summarized in Table 2.