An Explainable XGBoost Model That Maps Breast Cancer Nuclear Biomarkers to Specific Genes and Pathways

Soha Mustafa Salih

Authors

Soha Mustafa Salih
soha.mohammed@uob.edu.ly
Department of Zoology, Faculty of Arts and Sciences- Al abyar, University of Benghazi, Al abyar, Libya

Keywords:

Breast cancer, XGBoost, explainable artificial intelligence (XAI), genetic biomarkers, nuclear morphometry, feature importance, Breast Cancer, cancer genetics

Abstract

Breast cancer remains a leading cause of cancer mortality worldwide, with over 2.26 million new cases and 684,000 deaths in 2020. Although next generation sequencing has advanced mutation detection, interpreting high dimensional morphological and genomic data remains challenging. Most machine learning models operate as “black boxes” lacking biological interpretability. This study develops an explainable AI framework using XGBoost on the Wisconsin Breast Cancer dataset (569 samples, 30 nuclear features) to classify malignancy and map morphological biomarkers to specific genes. The model achieved 97.3% accuracy, 0.98 sensitivity, 0.96 precision, and an AUC of 0.99.The top biomarkers worst concave points, worst perimeter, and worst area—were genetically linked to nuclear envelope instability (LMNA, LMNB1), actin dysregulation (ACTN4, CTNNA1), aneuploidy (MYC, E2F1), and epigenetic changes (EZH2). Chromatin texture was independent of nuclear size (r ≤ 0.37), indicating separate genetic controls. Unlike prior studies that report accuracy without biological grounding, this work offers testable genetic hypotheses and a clinically actionable pre screening tool for genetic laboratories, reducing unnecessary invasive procedures and advancing precision medicine.

Dimensions

An Explainable XGBoost Model That Maps Breast Cancer Nuclear Biomarkers to Specific Genes and Pathways

Authors

Keywords:

Abstract

Published

How to Cite

Issue

Section

Make a Submission

Language

Current Issue

Information

Keywords