Transforming Breast Cancer Prediction: Advanced Machine Learning Models for Accurate Prediction and Personalized Care

Authors

  • Usha Adiga Department of Biochemistry, Apollo Institute of Medical Sciences and Research Chittoor, Murukambattu - 517127, Chittoor, Andhra Pradesh, India
  • Sampara Vasishta Department of Biochemistry, Apollo Institute of Medical Sciences and Research Chittoor, Murukambattu - 517127, Chittoor, Andhra Pradesh, India
  • Alfred J. Augustine Department of Surgery, Apollo Institute of Medical Sciences and Research Chittoor, Murukambattu - 517127, Chittoor, Andhra Pradesh, India
  • Kasala Farzia Department of Biochemistry, Apollo Institute of Medical Sciences and Research Chittoor, Murukambattu - 517127, Chittoor, Andhra Pradesh, India
  • Eddula Venkataravikanth Department of Dermatology (DVL), Apollo Institute of Medical Sciences and Research Chittoor, Murukambattu - 517127, Chittoor, Andhra Pradesh, India
  • Lokesh Ravi Centre for Digital Health & Precision Medicine, The Apollo University, Chittoor, Andhra Pradesh, 517127, India

DOI:

https://doi.org/10.6000/1929-6029.2025.14.54

Keywords:

Breast Cancer, Machine Learning, Random Forest, AUC-ROC, Predictive Modeling

Abstract

Background: Breast cancer is the most common malignancy among women worldwide, underscoring the importance of early detection and accurate prognostication. Machine learning (ML) has emerged as a promising approach, offering powerful tools for analyzing complex datasets in breast cancer prediction and diagnosis.

Objective: This study evaluates the predictive performance of diverse ML algorithms for breast cancer classification using publicly available datasets, focusing on accuracy, interpretability, and generalizability.

Methods: The dataset included clinical and demographic variables such as age, menopausal status, tumor size, and lymph node involvement. Data preprocessing addressed missing values and class imbalance, with the Synthetic Minority Oversampling Technique (SMOTE) applied to improve sensitivity for the minority class. Feature engineering involved interaction terms and scaling of numerical variables. Multiple ML models—Logistic Regression, Decision Tree, Random Forest, Gradient Boosting, Support Vector Machine (SVM), Naive Bayes, K-Nearest Neighbors (KNN), and Neural Networks—were trained and evaluated. Performance was measured using sensitivity, F1-score, and AUC-ROC. Model interpretability was enhanced with SHapley Additive exPlanations (SHAP).

Results: Random Forest achieved the best performance with an AUC-ROC of 0.9751, followed by Gradient Boosting (0.9242) and Neural Networks (0.9254). Logistic Regression and SVM yielded comparable results (0.9005 and 0.9344). Ensemble models showed higher accuracy and generalizability, particularly on external validation. Tumor size and lymph node involvement emerged as key predictors. SMOTE improved sensitivity across models.

Conclusion: This study demonstrates the potential of ML in breast cancer prediction, emphasizing the effectiveness of ensemble methods and interpretability tools. Future work should focus on integrating ML into clinical practice for earlier detection and personalized treatment.

References

Wilkinson L, Gathani T. Understanding breast cancer as a global health concern. Br J Radiol 2022; 95(1130): 20211033.

Arnold M, et al. Current and future burden of breast cancer: global statistics for 2020 and 2040. Breast 2022; 66: 15-23.

Chakraborty C, Bhattacharya M, Pal S, Lee S-S. From machine learning to deep learning: advances of the recent data-driven paradigm shift in medicine and healthcare. Curr Res Biotechnol 2024; 7: 100164.

Liao J, et al. Artificial intelligence assists precision medicine in cancer treatment. Front Oncol 2022; 12: 998222.

Zhang B, Shi H, Wang H. Machine learning and AI in cancer prognosis, prediction, and treatment selection: a critical approach. J Multidiscip Healthc 2023; 16: 1779-1791.

Islam T, et al. Predictive modeling for breast cancer classification in the context of Bangladeshi patients by use of machine learning approach with explainable AI. Sci Rep 2024; 14(1): 8487.

Khalid A, et al. Breast cancer detection and prevention using machine learning. Diagnostics (Basel) 2023; 13(19): 3113.

Omar ED, et al. Comparative analysis of logistic regression, gradient boosted trees, SVM, and random forest algorithms for prediction of acute kidney injury requiring dialysis after cardiac surgery. Int J Nephrol Renovasc Dis 2024; 17: 197-204.

Noura HN, Chu T, Allal Z, Salman O, Chahine K. A comparative study of ensemble methods and multi-output classifiers for predictive maintenance of hydraulic systems. Results Eng 2024; 24: 102900.

Kern C, Klausch T, Kreuter F. Tree-based machine learning methods for survey research. Surv Res Methods 2019; 13(1): 73-93.

Priya CV L, V G BV, B R V, Ramachandran S. Deep learning approaches for breast cancer detection in histopathology images: a review. Cancer Biomark 2024; 40(1): 1-25.

Han Y, Joe I. Enhancing machine learning models through PCA, SMOTE-ENN, and stochastic weighted averaging 2024.

Gonzalez-Cuautle D, et al. Synthetic minority oversampling technique for optimizing classification tasks in botnet and intrusion-detection-system datasets 2020.

Khan AA, Chaudhari O, Chandra R. A review of ensemble learning and data augmentation models for class imbalanced problems: combination, implementation and evaluation. Expert Syst Appl 2024; 244: 122778.

Jaganathan D, Balasubramaniam S, Sureshkumar V, Dhanasekaran S. Revolutionizing breast cancer diagnosis: a concatenated precision through transfer learning in histopathological data analysis. Diagnostics (Basel) 2024; 14(4): 0422.

Amethiya Y, Pipariya P, Patel S, Shah M. Comparative analysis of breast cancer detection using machine learning and biosensors. Intell Med 2022; 2(2): 69-81.

Ming C, Viassolo V, Probst-Hensch N, Chappuis PO, Dinov ID, Katapodi MC. Machine learning techniques for personalized breast cancer risk prediction: comparison with the BCRAT and BOADICEA models. Breast Cancer Res 2019; 21(1): 75.

Sardanelli F, Magni V, Rossini G, Kilburn-Toppin F, Healy NA, Gilbert FJ. The paradox of MRI for breast cancer screening: high-risk and dense breasts—available evidence and current practice. Insights Imaging 2024; 15(1): 96.

Sharma A, Goyal D, Mohana R. An ensemble learning-based framework for breast cancer prediction. Decis Anal J 2024; 10: 100372.

Obaido G, et al. Supervised machine learning in drug discovery and development: algorithms, applications, challenges, and prospects. Mach Learn with Appl 2024; 17: 100576.

Javaid M, Haleem A, Singh RP, Suman R, Rab S. Significance of machine learning in healthcare: features, pillars and applications. Int J Intell Networks 2022; 3: 58-73.

Singh S, Kumar R, Payra S, Singh SK. Artificial intelligence and machine learning in pharmacological research: bridging the gap between data and drug discovery. Cureus 2023; 15(8): e44359.

Cabitza F, et al. The importance of being external: methodological insights for the external validation of machine learning models in medicine. Comput Methods Programs Biomed 2021; 208: 106288.

Hanna M, et al. Ethical and bias considerations in artificial intelligence (AI)/machine learning. Mod Pathol 2024; 100686.

Freiesleben T, König G, Molnar C, Tejero-Cantero Á. Scientific inference with interpretable machine learning: analyzing models to learn about real-world phenomena. Minds Mach 2024; 34(3): 32.

Nasarian E, Alizadehsani R, Acharya UR, Tsui K-L. Designing interpretable ML system to enhance trust in healthcare: a systematic review to proposed responsible clinician-AI-collaboration framework. Inf Fusion 2024; 108: 102412.

Marey A, et al. Explainability, transparency and black box challenges of AI in radiology: impact on patient care in cardiovascular radiology. Egypt J Radiol Nucl Med 2024; 55(1): 183.

Salahuddin Z, Woodruff HC, Chatterjee A, Lambin P. Transparency of deep neural networks for medical image analysis: a review of interpretability methods. Comput Biol Med 2022; 140: 105111.

Sushmitha GLN, Utukuru S. Age-based disease prediction and health monitoring: integrating explainable AI and deep learning techniques. Iran J Comput Sci 2025.

Farah L, Murris JM, Borget I, Guilloux A, Martelli NM, Katsahian SIM. Assessment of performance, interpretability, and explainability in artificial intelligence-based health technologies: what healthcare stakeholders need to know. Mayo Clin Proc Digit Heal 2023; 1(2): 120-138.

Jin Y, Lan A, Dai Y, Jiang L, Liu S. Development and testing of a random forest-based machine learning model for predicting events among breast cancer patients with a poor response to neoadjuvant chemotherapy. Eur J Med Res 2023; 28(1): 394.

Becker AS, Marcon M, Ghafoor S, Wurnig MC, Frauenfelder T, Boss A. Deep learning in mammography: diagnostic accuracy of a multipurpose image analysis software in the detection of breast cancer. Invest Radiol 2017; 52(7): 434-440.

Downloads

Published

2025-09-26

Issue

Section

Special Issue: Trends in Artificial Intelligence and Machine Learning in Healthcare

How to Cite

Transforming Breast Cancer Prediction: Advanced Machine Learning Models for Accurate Prediction and Personalized Care. (2025). International Journal of Statistics in Medical Research, 14, 569-577. https://doi.org/10.6000/1929-6029.2025.14.54

Similar Articles

1-10 of 165

You may also start an advanced similarity search for this article.

Most read articles by the same author(s)