ABSTRACT
Objective
Breast cancer (BC) is highly heterogeneous and one of the most common cancers. Luminal A (LUM A) is a subtype of BC with a better prognosis than other BC subtypes. The molecular mechanisms underlying the initiation and progression of the LUM A subtype are still unclear. Big data generated from microarray and sequencing systems can be re-analyzed, especially with the help of various in silico tools developed in recent years, and made applicable for in vitro and in vivo research. This work aimed to identify genes that may play a role in the progression of LUM A subtype of BC using both computational and laboratory-based methods.
Materials and Methods
Overlapping genes associated with BC were identified from the The Cancer Genome Atlas database, GSE233242, GSE100925 geodata sets, and the geneshot tool. The network functional analysis between overlapping genes was determined with STRING 12.0. Expression levels of overlapping genes in BC were investigated with the TNMplot (https://tnmplot.com/analysis/) in silico tool. The effect of overlapping genes on the overall survival of LUM A cancer patients was defined using the Kaplan-Meier plotter tool. Expressions of genes identified using bioinformatics data were investigated via quantitative real-time -polymerase chain reaction (qRT-PCR) in LUM A tumor and adjacent tissue samples. The data were evaluated using the t-test. Both the sensitivity and specificity of selected genes have been determined using the receiver operating characteristic curve.
Results
In silico investigation showed that eleven genes were possibly associated with BC. Among them CDC25A, AURKB, and TOP2A were considerably increased in LUM A samples according to qRT-PCR results. An overall survival analysis also showed that overexpression of these three genes could reduce the overall survival of LUM A patients.
Conclusion
The genes CDC25A, AURKB, and TOP2A may play crucial functions in LUM A pathogenesis. Therapeutic strategies that diminish the expression of these connected genes may enhance the prognosis of LUM A patients.
Key Points
• Overexpression of CDC25A.
• AURKB.
• TOP2A can be potential biomarkers for luminal A.
Introduction
Breast cancer (BC) is the most prevalent cancer in women globally, accounting for the second-greatest percentage of cancer-related fatalities among women. BC is a disease that varies greatly regarding morphological and biological characteristics, clinical behavior, and therapeutic responses (1, 2). Currently, BC has been classified molecularly as luminal A (LUM A), luminal B (LUM B), human epidermal growth factor receptor 2 positive (HER2+), triple-negative BC (TNBC), and normal breast-like (3). According to investigations, further categorizing these subgroups is possible and important. Approximately 70% of BC patients suffer from the LUM A subtype, which has a positive estrogen receptor (ER+) but lacks an amplification of the HER2 (4). LUM A tumors have a decreased probability of recurrence compared to other subtypes of BC. However, there is still a need to understand the mechanisms behind the onset and progression of the LUM A subtype, which has a variable prognosis (5). Since the tumor is hormone receptor-positive, endocrine therapy is effectively preferred in the treatment of LUM A BC. However, the efficacy of endocrine therapy for LUM A may differ based on several genetic factors (6). For example, it was proposed that GATA3 mutations may result in altered gene expression in ER-positive BCs, which might influence prognosis (7). Alfarsi et al. (8) showed that high KIF18A expression is a prognostic factor and can predict adverse outcomes of endocrine treatment in individuals with ER-positive BC. Therefore, identifying differently expressed genes may be valuable for more precise categorization, clarification of molecular pathways, and improving disease treatment success rates in the future. In the current study, a bioinformatic approach was used to identify overlapping genes within two BC-related datasets, The Cancer Genome Atlas (TCGA) and BC-relevant genes. Several in silico tools were used to conduct an enrichment analysis of overlapping genes. The expression of three overlapping genes was further investigated using the quantitative real-time -polymerase chain reaction (qRT-PCR) method in tumor and adjacent normal tissue samples from 30 LUM A cancer patients. Then the results were evaluated using receiver operating characteristic (ROC) analysis.
Materials and Methods
Using Bioinformatics Approaches to Uncover BC-Associated Genes
Analysis of Gene Expression Alterations in TCGA BC Samples
TCGA (https://cancergenome.nih.gov/) is a very important database in which approximately 20,000 primary tumors and adjacent samples of dozens of different cancer types are molecularly analyzed. The TCGA-BC data was analyzed using the GEPIA2 online tool (http://gepia2.cancer-pku.cn/) to determine significant genes. The “Differential Expression Analysis” option was initially selected in the GEPIA2 online tool. The research was subsequently conducted by selecting “Breast cancer” in the dataset section and “ANOVA” in the method selection section on the opened page. Overexpressed genes in TCGA-BC data were identified with log fold change (logFC) >+1 and p<0.001 criterion.
Detection of Gene Expression Changes in BC- Gene Expression Omnibus Datasets
Gene Expression Omnibus (GEO) (https://www.ncbi.nlm.nih.gov/gds) is a publicly available source of functional genomics data. Microarrays or sequence-based studies’ results are accepted in the GEO database. Using the GEO database, GEO datasets related to many diseases can be downloaded and the expression profiles can be reanalyzed. The GSE233242 (29 LUM A tumor tissues and adjacent normal tissue samples) and GSE100925 (36 BC tumor tissue samples and adjacent normal tissue samples) GEO datasets were obtained from the GEO database and analyzed using GEO2R. GEO2R is a user-friendly online tool that allows users to compare multiple data sets from a GEO series to identify differently expressed genes, miRNAs, circRNAs and other molecules. Among GEO2R analysis results genes with logFC >+1 and p<0.001 were defined.
Determination of the Most Closely BC-associated Genes
Geneshot is a free, publicly available tool that allows researchers to obtain ranked lists of genes related to search terms (9). The Geneshot tool was used to screen for BC-associated genes. The search query “Breast cancer” was inputted in the “Search for these terms” field, and the number “500” was entered in the “Top Associated Genes to Make Predictions” search field in Geneshot. Subsequently, the option “AutoRIF (automatically search from PubMed)” was chosen.
Determination of Overlapping Genes
Overlapping BC-related genes were identified in the TCGA database, GSE233242, GSE100925 geo datasets, and the Geneshot tool. Then, a Venn diagram was generated using the Functional Enrichment tool (http://www.funrich.org/).
Enrichment Analyses of Overlapping Genes
STRING 12.0 (https://string-db.org/) is a software tool and knowledgebase for identifying and predicting protein-protein interactions. The network functional analysis between overlapping genes was determined with STRING 12.0 tool. TNMplot (https://tnmplot.com/analysis/) is a free and publicly available tool that allows differential gene expression analysis in tumor tissues, normal tissues, and metastatic tissues using TCGA, GEO, and GTEx data. Expression levels of overlapping genes in BC were investigated with the TNMplot in silico tool. Kaplan-Meier plotter (KM plotter) (https://kmplot.com/analysis/) is a web-based tool designed to evaluate the expression and survival rates of genes/miRNAs in various forms of cancer, using publicly available transcriptome data such as TCGA. The effect of overlapping genes on the overall survival (OS) of LUM A patients was defined using the KM plotter tool. TCGA-BC data was utilized to evaluate Spearman correlation analysis of three overlapping genes in bioinformatics data (via GEPIA2).
Verification of Bioinformatics-Derived Data
Patients and Specimens
From November 2020 to November 2022, 30 pairs of human BC specimens (tumor tissues and adjacent normal tissues) were obtained from patients who underwent breast surgery at the İstanbul Faculty of Medicine Hospital, Department of General Surgery, İstanbul University (İstanbul, Turkey), The study was approved by the Ethics and Scientific Committees of İstanbul Faculty of Medicine, İstanbul University (number: 29624016-050.99-903, date: 01.07.2020). Written informed consent from all the patients was obtained.
Investigation of the Chosen Genes’ Relative Expressions Using QRT-PCR
Total RNAs from 30 pairs of LUM A tissue samples were extracted with TRIzol reagent (Invitrogen, San Diego, CA, USA) following the manufacturer’s instructions. A NanoDrop spectrophotometer (Thermo Fisher Scientific, Waltham, MA, USA) was used to evaluate the quality and amount of RNA samples. To investigate the expression of chosen genes the same amount of RNA from the samples was reverse transcribed into cDNA using the cDNA Reverse Transcription Kit (Thermo). qRT-PCR experiments were conducted using 5x HOT FIRE qPCR Mix Plus (Solis BioDyne, Tartu, Estonia). GAPDH expression was used to normalize gene expression. Each reaction was conducted at least twice. Relative gene expressions were calculated using the 2-ΔΔCt method. Both the sensitivity and specificity of genes were determined using the receiver operating characteristic (ROC) curve.
Statistical Analysis
Bioinformatic evaluations were performed using publicly available platforms. The current study employed the 2-ΔΔCt method to evaluate gene expression levels between tumor specimens and adjacent normal tissue groups. Data are presented as mean ± standard deviation. Data were analyzed using GraphPad Prism version 10.0 (www.graphpad.com). A statistically significant difference was defined as p<0.05. The ROC curves, the area under the ROC curve, the cut-off point, sensitivity, and specificity for all genes were calculated.
Results
Bioinformatics Analyzes Showed an Overlap of 11 Genes in Datasets
Eleven genes in the TCGA-BC database GSE233242, GSE100925 geo datasets, and Geneshot tool overlapped with logFC>+1 and p<0.001 criterion (Table 1).
Used Bioinformatics Data in the Current Study Show 11 Genes Were Closely Associated With BC
The TCGA-BC database has 250 genes, whereas the GSE233242 dataset contains 1858 genes and the GSE100925 dataset contains 257 genes that match the logFC>+1, p<0.001 criterion. These genes were compared to the 500 most BC-associated genes in the Geneshot tool, and 11 of them overlapped (Figure 1A). As a result of the network analysis performed through STRING 12.0, it was determined that the interactions between overlapping genes were more than expected. (p<1.0e-16) (Figure 1B). The findings of TCGA-BC RNA-seq data analysis using TNMplot revealed that the expression of all 11 genes was higher in tumor and metastatic tissue samples than in normal tissues (Figure 1C). Moreover, when the keywords “Breast cancer, gene names” were searched in PubMed, it was found that all of these genes were associated with BC. Remarkably, a profoundly meaningful relationship was seen among the 11 chosen genes. These findings imply that developing specific therapy approaches to inhibit gene expression might be beneficial.
Overlapping Genes May Be Biomarkers for LUM A Overall Survival
OS analysis using the KM plotter tool demonstrated that overexpression of overlapping genes other than PLK1 significantly affected LUM A OS (Figure 2). Three of the overlapping genes were strongly correlated with each other. According to the Spearman correlation analysis carried out on TCGA-BC data using GEPIA2, the CDC25A, AURKB, and TOP2A genes are most likely to be co-expressed (Figure 3). According to qRT-PCR results, all three selected genes (CDC25A, AURKB, TOP2A) were found to have increased expression in LUM A tumor samples compared to adjacent normal tissue samples (Figure 4). We employed a ROC curve study to determine whether selected genes may be utilized as prognostic biomarkers. Our findings indicated that CDC25A, AURKB, and TOP2A are promising LUM A indicators (Figure 5).
Discussion and Conclusion
Studies have revealed that BC is a very heterogeneous cancer at the molecular level (2, 10). There is a need to elucidate the molecular mechanisms more clearly to develop treatment strategies. Concurrently, with the advances in microarray and sequencing technologies in recent years, a substantial volume of raw data regarding several types of malignancies, including BC, has been accumulated. Validating all this huge data in vitro or in vivo is a highly difficult and costly undertaking. Consequently, several in silico tools have been developed to aid in the filtration and processing of this data. Thus, using in silico tools, many genes/miRNAs and other molecules that may play a role in BC have been suggested (11-13). In the current investigation, we employed some in silico tools to identify genes that may be linked to LUM A. These genes were subsequently verified in LUM A patient samples.
Studies demonstrated that all 11 genes we identified with bioinformatics methods in our study are closely related to BC. For example, BIRC5 has been reported to mediate poor response to radiotherapy in HER2-positive BCs (14). Elevated CCNB1 expression has been related to a poor prognosis and tumor immune infiltration in BC (15). It has been shown that successful treatment results can be achieved in BC subtypes by targeting the transcription factor FOXM1, which has an oncogenic effect in BC (16). However, the status of these genes’ expression and roles in LUM A are unclear. Moreover, in our study, the co-expression scores of these genes were found to be higher than expected (Figure 1B), and the significant potential of these genes to play a role in LUM A-OS indicates that they may be of critical importance for LUM A (Figure 2). Identifying genes that have similar functions within the cell and exhibit stronger interactions with each other is crucial for understanding molecular pathways. Thus, our research findings are valuable and the suggested genes can be regarded as indicators for elucidating the molecular mechanisms involved in developing LUM A. Therefore, more detailed studies are needed to elucidate the roles of these genes in the LUM A subtype.
The expression levels of CDC25A, AURKB, and TOP2A, among the overlapping 11 genes detected using bioinformatics methods, were investigated in 30 LUM A specimens by qRT-PCR. It was observed that all three genes were overexpressed in LUM A tumor samples compared to the control group. Moreover, the expressions of these genes were found to be reliable in the ROC curves. These findings suggest that these three genes may play important roles in the LUM A subtype.
CDC25A is a cell cycle accelerating phosphatase and increased expression of this gene has been associated with many cancers (17). Although studies have clearly shown the relationship between CDC25A and BC the function of CDC25A in LUM A remains unclear (18). CDC25A is involved in the BC process with many genes and miRNAs. For example, in the study of Feng et al. (19), it was shown that CDC25A participated in the BC metastasis process by controlling matrix metalloprotease 1 through Foxo1. Ectopic miR-100-5p expression has been demonstrated to reduce BC cell proliferation, migration, and invasion while increasing apoptosis via inhibiting the expression of CDC25A (20). MicroRNA-99a-5p has been reported to suppress BC progression and cell cycle pathways by downregulating CDC25A (21).
The AURKB gene is also closely associated with BC. For instance, it has been shown that polymorphisms in the AURKB gene can predict the OS or disease-free survival of TNBC patients treated with taxane-based adjuvant chemotherapy (22). O6-benzyl guanine, an ethylguanine-DNA methyl transferase (MGMT) inhibitor, has been shown to reduce the expression of many genes, including TOP2A and AURKB, sensitizing ER-positive BC to temozolomide (23). Another study suggested that NEK2, BIRC5, and TOP2A genes may be potential targets in obese patients with LUM A BC (24).
TOP2A is an isoform of TOP2, a nuclear protein that plays an important role in DNA replication and cell division. TOP2A is highly expressed in proliferating and growing cells, and overexpression of this gene has been detected in various human malignancies, such as hepatocellular carcinoma, primary BC, and colon cancer (25).
Several TOP2A inhibitors have been used to treat different malignancies (25, 26). Studies have shown that the expression alteration of TOP2A, which is targeted by multiple microRNAs and long non-coding RNAs, has a role in cancer processes. For example, a study targeting the long non-coding RNA MALAT1 demonstrated that BC cells were suppressed via the microRNA-561-3p/TOP2A axis (27). Although it is known that TOP2A generally shows increased expression levels in BC, there is not enough data regarding its expression level in LUM A patients (24, 28).
The expression of genes can be controlled in several ways (29-31). Non-codingRNAs, such as microRNAs and circular RNAs, are crucial molecules that regulate gene expression (32-34). Studies demonstrated that alterations in the expression of these noncoding RNAs can be important in several cancer processes via many targeted genes (35).
Although non-coding RNAs have not yet been employed in therapy, it is anticipated that they may have enormous potential in the future. In recent years, several inhibitors have been discovered to decrease the expression of overexpressed genes in the cell. We believe that therapy methods can be developed in the future by inhibiting the expression of genes such as CDC25A, AURKB, and TOP2A in LUM A cancer utilizing different inhibitors and/or noncoding RNAs. Further studies can be performed using in vitro and in vivo methods to silence the expression of these genes and uncover their functional implications on cancer processes. Therefore, our findings will provide hints for future in vitro and in vivo investigations.