Methodology · Published 2026-05-23

Foreign Intervention, graduated: what eighteen years of indicator trajectories say

QGI's first ML-validated recipe earns its risk score today. Four indicators, eighteen years, two hard validation gates cleared. The mechanic is similarity to a historical pre-crisis trajectory — and the line between similarity and probability is where this piece spends most of its time.

A recipe is now graduated

For most of QGI's history, the word recipeon our methodology page has been doing more work than the code. We've scored countries against patterns we believed were valid; we hadn't formally validated, in held-out historical data, the five components that a recipe must specify. As of today, one has been validated. The recipe is foreign_intervention. Both hard validation gates cleared. Risk scores derived from it are now published for the 2021 bake.

This piece is the long version of that announcement. It explains what graduated, what the underlying mechanic does, and — more importantly — what it cannot do. The validation that earned the graduation tells one specific thing: our scoring mechanic can discriminate pre-crisis trajectories from non-pre-crisis trajectories. It does not tell us, and we do not claim, a calibrated probability of the event itself.

What QGI means by “recipe”

The Methodology Charter defines a recipe as five components. Every recipe must specify all five before it can carry a published score:

  1. Key indicators — the small handful of substrate indicators (out of 104 we track) that carry the strongest discriminative signal for this crisis class.
  2. Canonical length — how many years (3–25y) the predictive signal accumulates over. Some crises develop over three years; some over twenty-five. The recipe says which.
  3. Canonical movement shape — the average trajectory of those indicators across the canonical length, computed from all historical positive instances in our corpus.
  4. Aggregation rule — how the per-indicator measurements combine into a single country-recipe number.
  5. Terminating event — the crisis type that defines the recipe. For foreign_intervention, a curated foreign-intervention event in our corpus.

Validation is the part that didn't exist before. An XGBoost classifier trained on our historical corpus produces the indicator selection and the canonical length. Cosine similarity on the z-scored trajectories produces the aggregation. Held-out historical events test whether the resulting score actually recovers the cases it was supposed to.

What graduated, in numbers

For foreign_intervention, the validated five components look like this:

IndicatorSourceSHAP weight
Regulatory Quality (RQ.EST)World Bank Governance51%
Power distribution by gender (v2pepwrgen)V-Dem27%
State fiscal capacity (v2stfisccap)V-Dem17%
International election monitoring (v2elintmon)V-Dem5%

Canonical length: eighteen years.The classifier's discriminative AUC peaked at this window. A country's score is computed from its trajectory across the four indicators over the eighteen years preceding the evaluation date.

Canonical movement shape: the mean z-scored trajectory of each indicator, averaged across all historical foreign-intervention positives in the training set.

Aggregation rule:cosine similarity is computed between each country's trajectory and the canonical trajectory, per indicator. The four per-indicator similarities are combined as a weighted average, with the SHAP-derived weights shown above. The result, in [-1, +1], is rescaled to [0, 100] for publication.

Terminating event: a curated foreign-intervention entry in our corpus — vetted by hand, timestamped to the year the intervention began.

What the risk score actually says

For every country, on every recipe that has graduated, QGI publishes a score in [0, 100]. The score reads:

Over the past eighteen years, how closely does this country's trajectory across the four key indicators align with the average pre-crisis trajectory of historical foreign interventions?

65 or higher is HIGH. Below 40 is LOW. Between is MODERATE. 50 is the point where the country's trajectory is geometrically orthogonal to the canonical shape — neither matching nor opposite.

In the 2021 bake, sixty-eight countries had complete coverage across the four indicators over the eighteen-year window. The three highest scores:

Iran 98.1 · HIGH(80% band 96.6–98.4)

Indicator trajectory overlay — Iran vs the foreign_intervention canonical pre-crisis profile.

Indicator trajectory overlay for Iran versus the foreign_intervention canonical crisis profile

Turkmenistan 97.0 · HIGH(80% band 95.6–97.2)

Indicator trajectory overlay — Turkmenistan vs the foreign_intervention canonical pre-crisis profile.

Indicator trajectory overlay for Turkmenistan versus the foreign_intervention canonical crisis profile

Equatorial Guinea 96.9 · HIGH(80% band 95.4–97.1)

Indicator trajectory overlay — Equatorial Guinea vs the foreign_intervention canonical pre-crisis profile.

Indicator trajectory overlay for Equatorial Guinea versus the foreign_intervention canonical crisis profile

Each chart overlays one country's trajectory across the four key indicators (faint coloured lines) against the canonical pre-crisis trajectory (heavier line). The closer the overlay, the higher the cosine similarity. In Iran's case, three of the four indicators trace the canonical shape closely across the full eighteen-year window. In Equatorial Guinea's case, the alignment is most pronounced on regulatory quality and fiscal capacity. The aggregate is similar; the substrate is country-specific.

The band after each score (e.g. Iran “98.1, 80% band 96.6–98.4”) is a margin of error, not decoration. We compute it by bootstrap-resampling the historical events that define the canonical shape and watching how much each country's score moves. It is honest about a real limit: the canonical shape for this recipe rests on far fewer events than the headline count suggests — its most heavily-weighted indicator, regulatory quality, is anchored on 46 historical instances. The band makes that uncertainty visible rather than hiding it behind a single confident number. Iran stays firmly in the HIGH zone even at the low end of its band.

The validation that earned the graduation

Graduation is not automatic. Before a recipe's scores are published, four validations run. Two are hard gates — if either fails, the recipe does not graduate. The other two produce evidence but do not block.

ValidationTypeResult
V1 Positive-instance recall

Median similarity at one year before event vs non-event baseline. Foreign intervention: 96.0 vs 70.1 — 25.8 percentile-point separation. Gate: 20pp.

HARD GATEPASS
V2 Discrimination-consistency correlation

Correlation between similarity score and XGBoost discriminator probability. Pearson 0.26, Spearman 0.50. Moderate; not problematic.

Reporting only
V3 Per-cohort risk landscape

Median similarity by cohort. Stable-democracy 5.9, mid-tier 66.7, fragile-state 94.8. Cleanly discriminating across three tiers.

Reporting only
V4 Out-of-time holdout

Held-out historical events must hit HIGH zone (≥65) at event-Y-1. Foreign intervention: 13 of 15 holdout events (86.7%). Gate: 60%.

HARD GATEPASS

Both hard gates cleared. The recipe graduates.

Why we are publishing this

The methodology pause that preceded this graduation was deliberate. For most of QGI's life, the scoring pipeline produced numbers without an ML-validated recipe substrate. The numbers were useful upstream of the validation gates the Methodology Charter demanded — and we said so — but they did not constitute a graduated recipe.

Foreign intervention is the first to clear those gates. Two parallel walks (civil_war, econ_recession) on the same framework returned different verdicts: both demonstrated that the indicator-selection mechanic worked, but neither cleared the additional thresholds the per-recipe pre-registration demanded. Foreign intervention cleared discrimination. The mechanism that earned this graduation — XGBoost-derived indicator selection, SHAP-weighted cosine similarity on z-scored trajectory profiles, held-out out-of-time validation — is now provenance for any subsequent recipe to use.

We expect more graduations in the coming weeks. We do not expect every recipe to clear. The validation framework is designed to fail recipes when the historical signal isn't there. That is the point. Recipes that fail are not published; recipes that pass are.

For the reader who wants more

Published 2026-05-23 · QG Intelligence · Indicator weights, validation statistics, and country scores cited from the validated 2021 bake of the foreign_intervention recipe. The discrimination layer is shipped directly; calibrated probability is deliberately not claimed. This piece reports the first formal ML recipe graduation and the validation that earned it. It is not a forecast.