Article’s

Predicting Diabetes Progression Through Regression and Ensemble Learning: A Comparative Machine Learning Study

Asfand Ahmad Jamalee

(05 – 2026)

DOI:

 

Diabetes Mellitus remains a global health crisis, requiring high-precision tools for early intervention. This research introduces GlycoSense, an advanced ensemble framework designed to predict diabetes risk using the Pima Indians Diabetes Database (PIDD). The methodology utilizes a stacking strategy integrating three heterogeneous base learners—Random Forest, Gradient Boosting, and XGBoost—optimized via 5-fold stratified cross-validation and a Logistic Regression meta-learner. To ensure clinical validity, the system employs a robust preprocessing pipeline featuring class-stratified median imputation and SMOTE-based oversampling applied strictly within training folds to prevent data leakage. Experimental results demonstrate that the stacking architecture achieves an Accuracy of 89.4% and an AUC-ROC of 0.94, significantly outperforming standalone models and traditional regression. To address the “black-box” challenge of ensemble methods, the study integrates SHAP (SHapley Additive exPlanations), providing mathematically rigorous feature attribution that identifies Glucose, BMI, and Age as the primary drivers of progression. The findings confirm that combining ensemble learning with Explainable AI (XAI) creates a transparent, high-performance decision-support tool ready for clinical integration. Keywords: Diabetes Prediction, Stacking Ensemble, XGBoost, Explainable AI, SHAP, Clinical Decision Support.

 

 

Scroll to Top