Breast Cancer Risk Prediction with Stochastic Gradient Boosting

Kıvrak M.

CLINICAL CANCER INVESTIGATION JOURNAL, vol.11, no.2, pp.26-31, 2022 (ESCI) identifier

  • Publication Type: Article / Article
  • Volume: 11 Issue: 2
  • Publication Date: 2022
  • Doi Number: 10.51847/21qrrklo4y
  • Journal Indexes: Emerging Sources Citation Index (ESCI)
  • Page Numbers: pp.26-31
  • Keywords: Breast cancer, Machine learning, Ensemble learning, Stochastic gradient boosting
  • Recep Tayyip Erdoğan University Affiliated: Yes


Breast cancer, which is an important public health problem worldwide, is one of the deadliest cancers in women. This study aims to classify open-access breast cancer data and identify important risk factors with the Stochastic Gradient Boosting Method. The open-access breast cancer dataset was used to construct a classification model in the study. Stochastic Gradient Boosting was used to classify the disease. Balanced accuracy, accuracy, sensitivity, specificity, and positive/negative predictive values were evaluated for model performance. The accuracy, balanced accuracy, sensitivity, specificity, positive predictive value, negative predictive value, and F1 score metrics obtained with the Stochastic Gradient Boosting model were 100 %, 100 %, 100 %, 100 %, 100 %, and 100 %, and 100 % respectively. In addition, the importance of the variables obtained, the most important risk factors for breast cancer were a cave. points_mean, area_worst, and perimeter_worst, concave. points_worst respectively. According to the study results, with the machine-learning model Stochastic Gradient Boosting used, patients with and without breast cancer were classified with high accuracy, and the importance of the variables related to cancer status was determined. Factors with high variable importance can be considered potential risk factors associated with cancer status and can play an essential role in disease diagnosis.