Clinovia.ai
White PaperPlasma BiomarkerAmyloid (A)XGBoost

Stage 2A — Plasma Amyloid Triage

A sensitivity-optimized XGBoost classifier for predicting amyloid PET positivity from blood-based biomarkers and clinical covariates, validated against FBB and AV45 PET imaging in the ADNI cohort.

0.915

Mean AUC (CV)

90.0%

Sensitivity

73.7%

Specificity

0.217

Threshold

1. Clinical Rationale

Amyloid PET imaging is the gold standard for detecting cerebral amyloid burden, a defining pathological feature of Alzheimer's disease. However, PET access is severely limited in community settings: scanners are concentrated in tertiary academic centers, costs range from ₩800,000 to ₩2,000,000 per scan in Korea, and reimbursement under NHI is restricted to specific indications requiring specialist referral.

Blood-based biomarkers — particularly plasma p-tau217 and the Aβ42/40 ratio — have demonstrated strong concordance with amyloid PET positivity in recent large-scale studies. Stage 2A operationalizes this evidence into a triage classifier: patients with high plasma amyloid probability are flagged for confirmatory PET or alternative pathway routing, while low-risk patients are spared unnecessary imaging.

The model is calibrated for high sensitivity (90.0%) to minimize missed amyloid-positive cases at this triage stage. False positives — patients flagged as amyloid-positive who are PET-negative — proceed to the fusion layer where MRI neurodegeneration data and clinical context can recalibrate the overall risk estimate.

2. Dataset and Cohort

The model was trained on ADNI participants who had both plasma biomarker measurements and amyloid PET imaging available. PET positivity was determined using standardized uptake value ratio (SUVR) thresholds for two tracers: florbetaben (FBB) and florbetapir (AV45).

ParameterValue
DatasetADNI (adnimerge.csv + PET imaging data)
Total samples396
PET tracer — FBB251 participants (63.4%)
PET tracer — AV45145 participants (36.6%)
Amyloid positive40.4% (160 participants)
Amyloid negative59.6% (236 participants)
Validation strategy5-fold stratified cross-validation
Features8 plasma biomarkers + clinical covariates

Multi-tracer design: Combining FBB and AV45 PET data increases sample size and tracer diversity, but introduces inter-tracer variability in SUVR thresholds. Both tracers were harmonized to a binary amyloid-positive/negative label using published tracer-specific cutoffs before model training.

3. Target Definition

The binary target variable represents amyloid PET positivity, defined using tracer-specific SUVR thresholds established in the ADNI imaging protocol. A value of 1 indicates amyloid-positive (A+); 0 indicates amyloid-negative (A−).

ClassLabelCountProportion
0Amyloid Negative (A−)23659.6%
1Amyloid Positive (A+)16040.4%

4. Feature Set

The model uses four plasma biomarkers and four clinical covariates. All plasma measurements were obtained from a single blood draw and processed using standard ADNI protocols. Missing values were median-imputed with binary missingness indicators retained.

FeatureGroupDescription
PLASMA_PTAU217Plasma BiomarkerPhosphorylated tau 217 — the strongest plasma predictor of amyloid PET positivity in current literature. Elevated levels reflect early tau phosphorylation downstream of amyloid accumulation.
PLASMA_ABETA_RATIOPlasma BiomarkerRatio of amyloid beta 42 to amyloid beta 40 peptides. A lower ratio reflects preferential sequestration of Aβ42 into amyloid plaques, making it an indirect blood-based marker of amyloid burden.
PLASMA_NfLPlasma BiomarkerA non-specific marker of axonal damage and neurodegeneration. Elevated NfL is associated with faster progression across multiple neurodegenerative conditions including AD.
APOE4GeneticNumber of APOE ε4 alleles (0, 1, or 2). The strongest genetic risk factor for late-onset Alzheimer's disease, influencing amyloid clearance and accumulation rate.
PLASMA_GFAPPlasma BiomarkerGlial fibrillary acidic protein — a marker of astrocyte activation and neuroinflammation. Elevated in early Alzheimer's disease, often preceding symptom onset.
AGEDemographicAge in years. Amyloid accumulation accelerates with age; older patients have higher prior probability of PET positivity independent of biomarker values.
EDUCATIONDemographicTotal years of formal education. Associated with cognitive reserve; higher education may delay symptom expression but does not reduce amyloid burden.
MMSECognitiveMini-Mental State Examination. Contributed minimal importance in this model — amyloid burden is not strongly correlated with current cognitive status in early disease stages.

5. Model Architecture

An XGBoost gradient-boosted tree classifier was trained using 5-fold stratified cross-validation for robust performance estimation. The classification threshold was optimized post-training to achieve 90% sensitivity on the validation folds, reflecting the screening-first design philosophy of Stage 2A.

ParameterValue
AlgorithmXGBoost (gradient boosted trees)
Validation5-fold stratified cross-validation
Classification threshold0.2169 (sensitivity-optimized)
Sensitivity target≥ 0.90
Feature count8 (+ missingness indicators)

Cross-Validation AUC by Fold

Fold 10.9642
Fold 20.9092
Fold 30.9342
Fold 40.8660
Fold 50.9016
Mean AUC: 0.9150Std: ± 0.0329

Threshold design: A threshold of 0.2169 means the model flags amyloid positivity when estimated probability exceeds 21.7%. This aggressive threshold catches 90% of true amyloid-positive cases at the cost of a 26.3% false positive rate — acceptable in a pipeline where false positives receive MRI gating (Stage 2B) and fusion-layer recalibration before any clinical action is taken.

6. Feature Importance (XGBoost)

Feature importance was computed using XGBoost's built-in gain-based importance metric, which measures the average improvement in the loss function brought by a feature across all splits where it is used. Unlike mean absolute SHAP, gain-based importance is not sample-averaged and reflects structural contribution to the tree ensemble.

Plasma p-tau217Plasma Biomarker
1.1412

Phosphorylated tau 217 — the strongest plasma predictor of amyloid PET positivity in current literature. Elevated levels reflect early tau phosphorylation downstream of amyloid accumulation.

Plasma Aβ42/40 RatioPlasma Biomarker
1.0538

Ratio of amyloid beta 42 to amyloid beta 40 peptides. A lower ratio reflects preferential sequestration of Aβ42 into amyloid plaques, making it an indirect blood-based marker of amyloid burden.

Neurofilament Light Chain (NfL)Plasma Biomarker
0.2757

A non-specific marker of axonal damage and neurodegeneration. Elevated NfL is associated with faster progression across multiple neurodegenerative conditions including AD.

APOE ε4 Allele CountGenetic
0.2690

Number of APOE ε4 alleles (0, 1, or 2). The strongest genetic risk factor for late-onset Alzheimer's disease, influencing amyloid clearance and accumulation rate.

GFAPPlasma Biomarker
0.1974

Glial fibrillary acidic protein — a marker of astrocyte activation and neuroinflammation. Elevated in early Alzheimer's disease, often preceding symptom onset.

AgeDemographic
0.0463

Age in years. Amyloid accumulation accelerates with age; older patients have higher prior probability of PET positivity independent of biomarker values.

Years of EducationDemographic
0.0178

Total years of formal education. Associated with cognitive reserve; higher education may delay symptom expression but does not reduce amyloid burden.

MMSE ScoreCognitiveminimal
0.0048

Mini-Mental State Examination. Contributed minimal importance in this model — amyloid burden is not strongly correlated with current cognitive status in early disease stages.

Key findings

  • p-tau217 and Aβ42/40 dominate: Plasma p-tau217 (importance = 1.141) and Aβ42/40 ratio (1.054) together account for approximately 75% of total model gain. This is consistent with the recent literature establishing p-tau217 as the single most accurate blood-based predictor of amyloid PET positivity, outperforming p-tau181 and Aβ42/40 alone.
  • NfL and GFAP as secondary signals: NfL (0.276) and GFAP (0.197) contribute meaningfully as markers of neurodegeneration and neuroinflammation respectively. Their contribution likely reflects co-pathology — amyloid-positive patients tend to have elevated neurodegeneration markers — rather than direct amyloid burden.
  • APOE4 as genetic prior: APOE4 (0.269) contributes at a similar level to NfL, reflecting its strong association with amyloid accumulation rate. It is particularly informative in borderline plasma biomarker cases where the genetic risk prior shifts the probability estimate meaningfully.
  • MMSE near-zero importance: MMSE (0.005) contributes essentially nothing, consistent with the known dissociation between amyloid burden and current cognitive status in early and preclinical AD. Patients with high amyloid load may have normal MMSE scores, making it unreliable as an amyloid triage feature.

7. Model Performance

Performance was estimated via 5-fold stratified cross-validation. The mean AUC of 0.915 (± 0.033) indicates strong and consistent discrimination across folds. Fold 4 showed the lowest AUC (0.866), likely reflecting sampling variability in a relatively small dataset (n = 396).

0.915

Mean AUC

5-fold CV

±0.033

AUC Std

Cross-fold variance

90.0%

Sensitivity

At threshold 0.217

73.7%

Specificity

At threshold 0.217

Specificity trade-off: At 73.7% specificity, approximately 1 in 4 amyloid-negative patients will be flagged as high-risk at this stage. This is an intentional design choice: Stage 2A is a triage tool, not a confirmatory test. False positives are resolved by the MRI gate (Stage 2B) and the fusion layer before any clinical decision is finalized.

8. Pipeline Integration

Stage 2A runs in parallel with Stage 2B (MRI neurodegeneration gate) following a HIGH_RISK_PROGRESSOR flag from Stage 1. Its output feeds into three downstream systems:

Risk Stratification (Tool 2)

amyloid_positive_probability is the primary input for tier assignment. Values ≥ 0.85 combined with N+ from Stage 2B trigger URGENT routing. Values ≥ 0.70 alone trigger HIGH_RISK.

PET Cost-Benefit Simulator (Tool 4)

The amyloid probability and diagnostic uncertainty score from Stage 2A are the primary inputs to the PET value calculation. Probabilities in the 0.40–0.70 range yield the highest PET value scores — confirmatory imaging changes management most in this zone.

Uncertainty Guard (Tool 6)

Conflicting signals between Stage 2A (high amyloid probability) and Stage 2B (N−) trigger an inter-stage conflict flag. The model confidence score contributes to the borderline probability and CI width rules.

9. Limitations

  • The training dataset is small (n = 396). Cross-validation AUC variance (±0.033) reflects this — performance estimates carry meaningful uncertainty and could shift in larger independent cohorts.
  • Plasma biomarker measurements are platform-dependent. p-tau217 and Aβ42/40 values vary across assay platforms (Elecsys, Lumipulse, Simoa, ALZpath). The model was trained on ADNI-specific assay outputs and may require recalibration for different laboratory platforms.
  • The FBB and AV45 PET tracers have different SUVR thresholds for amyloid positivity. While tracer-specific cutoffs were applied, residual inter-tracer variability may introduce noise in the training labels.
  • ADNI participants are highly selected — mostly MCI patients who consented to amyloid PET imaging. This induces referral bias: the model may underperform in unselected primary care populations where amyloid prevalence and biomarker distributions differ.
  • Pre-analytical variability in plasma collection (time to centrifugation, freeze-thaw cycles, tube type) significantly affects p-tau217 and Aβ42/40 measurements. Deployment requires strict pre-analytical standardization.
  • The model does not currently incorporate longitudinal plasma biomarker trajectories. A single time-point measurement captures current burden but not the rate of change, which is an independent predictor of progression.
  • This tool is intended for research-use clinical decision support only. It is not cleared as a medical device and must not be used as a standalone diagnostic instrument.

10. References

  • [1]Ashton NJ, et al. Plasma p-tau217 in Alzheimer's disease: a key biomarker for diagnosis and prognosis. Nat Med. 2024;30(2):387–394.
  • [2]Janelidze S, et al. Plasma P-tau217 in Alzheimer's disease: a prospective, multicohort, phase 3 diagnostic accuracy study. Lancet Neurol. 2021;20(6):468–478.
  • [3]Hansson O, et al. Blood biomarkers for Alzheimer's disease in clinical practice and trials. Nat Aging. 2023;3:506–519.
  • [4]Schindler SE, et al. High-precision plasma β-amyloid 42/40 predicts current and future brain amyloidosis. Neurology. 2019;93(17):e1647–e1659.
  • [5]Simrén J, et al. The diagnostic and prognostic capabilities of plasma biomarkers in Alzheimer's disease. Alzheimers Dement. 2021;17(7):1145–1156.
  • [6]Jack CR Jr, et al. NIA-AA Research Framework: Toward a biological definition of Alzheimer's disease. Alzheimers Dement. 2018;14(4):535–562.
  • [7]Petersen RC, et al. Alzheimer's Disease Neuroimaging Initiative (ADNI): Clinical characterization. Neurology. 2010;74(3):201–209.
  • [8]Chen T, Guestrin C. XGBoost: A scalable tree boosting system. KDD. 2016:785–794.