Transforming Breast Cancer Prediction: Advanced Machine Learning Models for Accurate Prediction and Personalized Care
DOI:
https://doi.org/10.6000/1929-6029.2025.14.54Keywords:
Breast Cancer, Machine Learning, Random Forest, AUC-ROC, Predictive ModelingAbstract
Background: Breast cancer is the most common malignancy among women worldwide, underscoring the importance of early detection and accurate prognostication. Machine learning (ML) has emerged as a promising approach, offering powerful tools for analyzing complex datasets in breast cancer prediction and diagnosis.
Objective: This study evaluates the predictive performance of diverse ML algorithms for breast cancer classification using publicly available datasets, focusing on accuracy, interpretability, and generalizability.
Methods: The dataset included clinical and demographic variables such as age, menopausal status, tumor size, and lymph node involvement. Data preprocessing addressed missing values and class imbalance, with the Synthetic Minority Oversampling Technique (SMOTE) applied to improve sensitivity for the minority class. Feature engineering involved interaction terms and scaling of numerical variables. Multiple ML models—Logistic Regression, Decision Tree, Random Forest, Gradient Boosting, Support Vector Machine (SVM), Naive Bayes, K-Nearest Neighbors (KNN), and Neural Networks—were trained and evaluated. Performance was measured using sensitivity, F1-score, and AUC-ROC. Model interpretability was enhanced with SHapley Additive exPlanations (SHAP).
Results: Random Forest achieved the best performance with an AUC-ROC of 0.9751, followed by Gradient Boosting (0.9242) and Neural Networks (0.9254). Logistic Regression and SVM yielded comparable results (0.9005 and 0.9344). Ensemble models showed higher accuracy and generalizability, particularly on external validation. Tumor size and lymph node involvement emerged as key predictors. SMOTE improved sensitivity across models.
Conclusion: This study demonstrates the potential of ML in breast cancer prediction, emphasizing the effectiveness of ensemble methods and interpretability tools. Future work should focus on integrating ML into clinical practice for earlier detection and personalized treatment.
References
Wilkinson L, Gathani T. Understanding breast cancer as a global health concern. Br J Radiol 2022; 95(1130): 20211033.
Arnold M, et al. Current and future burden of breast cancer: global statistics for 2020 and 2040. Breast 2022; 66: 15-23.
Chakraborty C, Bhattacharya M, Pal S, Lee S-S. From machine learning to deep learning: advances of the recent data-driven paradigm shift in medicine and healthcare. Curr Res Biotechnol 2024; 7: 100164.
Liao J, et al. Artificial intelligence assists precision medicine in cancer treatment. Front Oncol 2022; 12: 998222.
Zhang B, Shi H, Wang H. Machine learning and AI in cancer prognosis, prediction, and treatment selection: a critical approach. J Multidiscip Healthc 2023; 16: 1779-1791.
Islam T, et al. Predictive modeling for breast cancer classification in the context of Bangladeshi patients by use of machine learning approach with explainable AI. Sci Rep 2024; 14(1): 8487.
Khalid A, et al. Breast cancer detection and prevention using machine learning. Diagnostics (Basel) 2023; 13(19): 3113.
Omar ED, et al. Comparative analysis of logistic regression, gradient boosted trees, SVM, and random forest algorithms for prediction of acute kidney injury requiring dialysis after cardiac surgery. Int J Nephrol Renovasc Dis 2024; 17: 197-204.
Noura HN, Chu T, Allal Z, Salman O, Chahine K. A comparative study of ensemble methods and multi-output classifiers for predictive maintenance of hydraulic systems. Results Eng 2024; 24: 102900.
Kern C, Klausch T, Kreuter F. Tree-based machine learning methods for survey research. Surv Res Methods 2019; 13(1): 73-93.
Priya CV L, V G BV, B R V, Ramachandran S. Deep learning approaches for breast cancer detection in histopathology images: a review. Cancer Biomark 2024; 40(1): 1-25.
Han Y, Joe I. Enhancing machine learning models through PCA, SMOTE-ENN, and stochastic weighted averaging 2024.
Gonzalez-Cuautle D, et al. Synthetic minority oversampling technique for optimizing classification tasks in botnet and intrusion-detection-system datasets 2020.
Khan AA, Chaudhari O, Chandra R. A review of ensemble learning and data augmentation models for class imbalanced problems: combination, implementation and evaluation. Expert Syst Appl 2024; 244: 122778.
Jaganathan D, Balasubramaniam S, Sureshkumar V, Dhanasekaran S. Revolutionizing breast cancer diagnosis: a concatenated precision through transfer learning in histopathological data analysis. Diagnostics (Basel) 2024; 14(4): 0422.
Amethiya Y, Pipariya P, Patel S, Shah M. Comparative analysis of breast cancer detection using machine learning and biosensors. Intell Med 2022; 2(2): 69-81.
Ming C, Viassolo V, Probst-Hensch N, Chappuis PO, Dinov ID, Katapodi MC. Machine learning techniques for personalized breast cancer risk prediction: comparison with the BCRAT and BOADICEA models. Breast Cancer Res 2019; 21(1): 75.
Sardanelli F, Magni V, Rossini G, Kilburn-Toppin F, Healy NA, Gilbert FJ. The paradox of MRI for breast cancer screening: high-risk and dense breasts—available evidence and current practice. Insights Imaging 2024; 15(1): 96.
Sharma A, Goyal D, Mohana R. An ensemble learning-based framework for breast cancer prediction. Decis Anal J 2024; 10: 100372.
Obaido G, et al. Supervised machine learning in drug discovery and development: algorithms, applications, challenges, and prospects. Mach Learn with Appl 2024; 17: 100576.
Javaid M, Haleem A, Singh RP, Suman R, Rab S. Significance of machine learning in healthcare: features, pillars and applications. Int J Intell Networks 2022; 3: 58-73.
Singh S, Kumar R, Payra S, Singh SK. Artificial intelligence and machine learning in pharmacological research: bridging the gap between data and drug discovery. Cureus 2023; 15(8): e44359.
Cabitza F, et al. The importance of being external: methodological insights for the external validation of machine learning models in medicine. Comput Methods Programs Biomed 2021; 208: 106288.
Hanna M, et al. Ethical and bias considerations in artificial intelligence (AI)/machine learning. Mod Pathol 2024; 100686.
Freiesleben T, König G, Molnar C, Tejero-Cantero Á. Scientific inference with interpretable machine learning: analyzing models to learn about real-world phenomena. Minds Mach 2024; 34(3): 32.
Nasarian E, Alizadehsani R, Acharya UR, Tsui K-L. Designing interpretable ML system to enhance trust in healthcare: a systematic review to proposed responsible clinician-AI-collaboration framework. Inf Fusion 2024; 108: 102412.
Marey A, et al. Explainability, transparency and black box challenges of AI in radiology: impact on patient care in cardiovascular radiology. Egypt J Radiol Nucl Med 2024; 55(1): 183.
Salahuddin Z, Woodruff HC, Chatterjee A, Lambin P. Transparency of deep neural networks for medical image analysis: a review of interpretability methods. Comput Biol Med 2022; 140: 105111.
Sushmitha GLN, Utukuru S. Age-based disease prediction and health monitoring: integrating explainable AI and deep learning techniques. Iran J Comput Sci 2025.
Farah L, Murris JM, Borget I, Guilloux A, Martelli NM, Katsahian SIM. Assessment of performance, interpretability, and explainability in artificial intelligence-based health technologies: what healthcare stakeholders need to know. Mayo Clin Proc Digit Heal 2023; 1(2): 120-138.
Jin Y, Lan A, Dai Y, Jiang L, Liu S. Development and testing of a random forest-based machine learning model for predicting events among breast cancer patients with a poor response to neoadjuvant chemotherapy. Eur J Med Res 2023; 28(1): 394.
Becker AS, Marcon M, Ghafoor S, Wurnig MC, Frauenfelder T, Boss A. Deep learning in mammography: diagnostic accuracy of a multipurpose image analysis software in the detection of breast cancer. Invest Radiol 2017; 52(7): 434-440.
Downloads
Published
Issue
Section
License

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
Policy for Journals/Articles with Open Access
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are permitted and encouraged to post links to their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work
Policy for Journals / Manuscript with Paid Access
Authors who publish with this journal agree to the following terms:
- Publisher retain copyright .
- Authors are permitted and encouraged to post links to their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work .