An Explainable XGBoost Model That Maps Breast Cancer Nuclear Biomarkers to Specific Genes and Pathways
Keywords:
Breast cancer, XGBoost, explainable artificial intelligence (XAI), genetic biomarkers, nuclear morphometry, feature importance, Breast Cancer, cancer geneticsAbstract
Breast cancer remains a leading cause of cancer mortality worldwide, with over 2.26 million new cases and 684,000 deaths in 2020. Although next generation sequencing has advanced mutation detection, interpreting high dimensional morphological and genomic data remains challenging. Most machine learning models operate as “black boxes” lacking biological interpretability. This study develops an explainable AI framework using XGBoost on the Wisconsin Breast Cancer dataset (569 samples, 30 nuclear features) to classify malignancy and map morphological biomarkers to specific genes. The model achieved 97.3% accuracy, 0.98 sensitivity, 0.96 precision, and an AUC of 0.99.The top biomarkers worst concave points, worst perimeter, and worst area—were genetically linked to nuclear envelope instability (LMNA, LMNB1), actin dysregulation (ACTN4, CTNNA1), aneuploidy (MYC, E2F1), and epigenetic changes (EZH2). Chromatin texture was independent of nuclear size (r ≤ 0.37), indicating separate genetic controls. Unlike prior studies that report accuracy without biological grounding, this work offers testable genetic hypotheses and a clinically actionable pre screening tool for genetic laboratories, reducing unnecessary invasive procedures and advancing precision medicine.
Published
How to Cite
Issue
Section

This work is licensed under a Creative Commons Attribution 4.0 International License.