Stage 2B — MRI Neurodegeneration Gate
A longitudinal-hybrid XGBoost classifier for structural MRI-based detection of neurodegeneration (N+) in the Alzheimer's continuum, trained and validated on the ADNI cohort.
0.993
AUC
97.3%
Accuracy
88.9%
Sensitivity
98.8%
Specificity
1. Clinical Rationale
The AT(N) biomarker framework classifies Alzheimer's pathology along three axes: amyloid (A), tau (T), and neurodegeneration (N). The neurodegeneration axis — assessed via structural MRI — captures downstream atrophy that correlates with symptom severity and progression rate.
Stage 2B serves as the neurodegeneration gate in our three-stage pipeline. It takes baseline MRI volumes and longitudinal atrophy trajectories as input and outputs a binary N+/N− classification alongside a continuous risk probability. This result feeds directly into the decision support layer, where it modulates treatment pathway routing, PET cost-benefit simulation, and uncertainty flagging.
Critically, Stage 2B does not require a positive amyloid signal to fire. Neurodegeneration can precede or occur independently of amyloid positivity, and detecting it early — even in amyloid-negative patients — may redirect clinical attention toward alternative neurodegenerative diagnoses.
2. Dataset and Cohort
The model was trained on the Alzheimer's Disease Neuroimaging Initiative (ADNI) dataset. Baseline visits were extracted using VISCODE = "bl", yielding 2,430 participants after merging with longitudinal slope estimates. Participants with missing target labels were excluded.
| Parameter | Value |
|---|---|
| Dataset | ADNI (adnimerge.csv) |
| Total samples | 2,430 |
| Train / Test | 80% / 20% stratified |
| N+ prevalence | 14.8% (class imbalance handled via scale_pos_weight) |
| Baseline visit | VISCODE = bl |
| Longitudinal | Slope computed from all available MRI timepoints |
3. Target Definition
Neurodegeneration status (N+) was defined using an age-adjusted hippocampal volume residual, operationalized as follows:
- Hippocampal volume was normalized by intracranial volume (ICV) to produce a size-corrected ratio.
- A linear regression model was fitted to predict normalized hippocampal volume from age, establishing an expected value per age.
- The residual (observed minus expected) was z-scored across the cohort.
- Participants with a hippocampal z-score below −1.0 were classified as N+ (neurodegeneration present). This threshold was chosen to balance sensitivity and specificity, avoiding over-identification at the extreme tails of the normative distribution.
Design note: This is a data-driven neurodegeneration proxy, not a histopathological ground truth. The z-score threshold of −1.0 was selected empirically to achieve clinically meaningful sensitivity while maintaining high specificity for this triage application.
4. Feature Engineering
The feature set combines baseline MRI volumes with longitudinal atrophy trajectories (slopes), computed by fitting a linear regression of volume against time (in days from first visit) for each participant. Missing slope values — arising when fewer than two timepoints were available — were median imputed, with binary missingness indicators retained as auxiliary features.
| Feature Group | Features |
|---|---|
| Baseline MRI volumes | Hippocampus, Entorhinal, Ventricles, WholeBrain, ICV |
| Longitudinal slopes | Hippocampus_slope, Ventricles_slope, WholeBrain_slope |
| Covariates | AGE, APOE4 |
| Missingness indicators | One binary indicator per feature (10 total) |
5. Model Architecture
An XGBoost gradient-boosted tree classifier was trained with the following hyperparameters. Class imbalance (14.8% N+) was addressed viascale_pos_weightset to the ratio of negative to positive training examples.
| Hyperparameter | Value |
|---|---|
| n_estimators | 400 |
| max_depth | 4 |
| learning_rate | 0.05 |
| subsample | 0.9 |
| colsample_bytree | 0.9 |
| scale_pos_weight | Negative / Positive ratio (≈5.75) |
| eval_metric | logloss |
| Classification threshold | 0.5 (balanced operating point) |
Unlike Stage 2A (which uses a sensitivity-forced threshold of ~0.90), Stage 2B operates at a balanced threshold of 0.5. This reflects a different clinical trade-off: false positives in the neurodegeneration gate trigger additional workup rather than immediate therapy, making balanced precision more appropriate than maximal recall.
6. Feature Importance (SHAP)
SHAP (SHapley Additive exPlanations) values were computed using a kernel explainer on a 100-sample background subset. Mean absolute SHAP values reflect each feature's average contribution to the model output across the test set.
Primary structural marker of neurodegeneration in Alzheimer's disease
Normalization reference for brain volume measurements
Age-adjusted residual accounts for expected atrophy trajectory
Global atrophy marker
Enlargement associated with parenchymal loss
Early Alzheimer's involvement site; trans-entorhinal staging
Genetic risk modifier
Longitudinal slope — rate of global atrophy
Longitudinal slope — rate of hippocampal volume loss
Key findings
- •Hippocampal volume dominates the model (SHAP = 0.202), accounting for more than twice the combined contribution of all other features. This is consistent with the established primacy of hippocampal atrophy as an AD neurodegeneration marker.
- •ICV and age are the next most important features, reflecting the normalization structure of the target variable (age-adjusted hippocampal volume residual).
- •Longitudinal slopes(Hippocampus_slope, WholeBrain_slope) show low SHAP values in this model. This may reflect that cross-sectional volume at baseline is already a strong predictor when the target is also defined from baseline volume, reducing the marginal value of slopes. Their contribution is expected to increase in prospective settings with longer follow-up.
- •APOE4 contributes minimally (SHAP = 0.0009) to the MRI-based model, consistent with its role as a genetic risk modifier rather than a direct structural marker.
7. Model Performance
Performance was evaluated on a held-out 20% stratified test set (n = 486). The model achieves near-perfect AUC (0.993) with high specificity, meaning the false positive rate is very low — a desirable property for a gating classifier that triggers additional downstream workup.
| Class | Precision | Recall | F1 | Support |
|---|---|---|---|---|
| N− (no neurodegeneration) | 0.98 | 0.99 | 0.98 | 414 |
| N+ (neurodegeneration) | 0.93 | 0.89 | 0.91 | 72 |
| Macro avg | 0.95 | 0.94 | 0.95 | 486 |
| Weighted avg | 0.97 | 0.97 | 0.97 | 486 |
8. Clinical Integration
Stage 2B output feeds into three downstream decision support tools:
Risk Stratification (Tool 2)
N+ status elevates the patient to HIGH_RISK or URGENT tier regardless of amyloid probability, reflecting the clinical urgency of confirmed neurodegeneration.
Treatment Pathway Router (Tool 3)
N+ triggers neurology referral pathways and may activate ChEI eligibility evaluation under NHI benefit codes, subject to confirmed dementia diagnosis.
Uncertainty Guard (Tool 6)
Conflicting signals between Stage 2A (plasma amyloid) and Stage 2B (MRI neurodegeneration) — e.g. N+ with low amyloid probability — trigger an inter-stage conflict flag and route the case for specialist review.
9. Limitations
- •The model was trained exclusively on ADNI data, which skews toward older, highly educated, predominantly North American participants. Performance may differ in Korean community clinic populations with different demographic and comorbidity profiles.
- •The neurodegeneration target is a surrogate derived from hippocampal volume z-scores, not a pathologically confirmed ground truth. It captures one dimension of neurodegeneration and may miss tau-driven or non-hippocampal atrophy patterns.
- •Longitudinal slopes showed low SHAP importance in this model. This is partly an artifact of the target definition (which is also cross-sectional at baseline). A prospective retraining on incident neurodegeneration outcomes would better evaluate slope utility.
- •MRI acquisition protocols vary across sites and scanner generations. Volumetric features are sensitive to field strength, voxel size, and segmentation software. Deployment requires protocol harmonization or domain adaptation.
- •The model does not incorporate tau PET, CSF biomarkers, or white matter hyperintensity burden, which are established neurodegeneration markers in the research literature.
- •This tool is intended for research-use clinical decision support only. It is not cleared as a medical device and must not be used as a standalone diagnostic instrument.
10. References
- [1]Jack CR Jr, et al. NIA-AA Research Framework: Toward a biological definition of Alzheimer's disease. Alzheimers Dement. 2018;14(4):535–562.
- [2]Petersen RC, et al. Alzheimer's Disease Neuroimaging Initiative (ADNI): Clinical characterization. Neurology. 2010;74(3):201–209.
- [3]Lundberg SM, Lee SI. A unified approach to interpreting model predictions. Advances in Neural Information Processing Systems. 2017;30.
- [4]Chen T, Guestrin C. XGBoost: A scalable tree boosting system. KDD. 2016:785–794.
- [5]Jack CR Jr, et al. Hypothetical model of dynamic biomarkers of the Alzheimer's pathological cascade. Lancet Neurol. 2010;9(1):119–128.
- [6]Frisoni GB, et al. The clinical use of structural MRI in Alzheimer disease. Nat Rev Neurol. 2010;6(2):67–77.