Curation in practice: Iran, Ukraine, Venezuela

Most of the writing on this blog is about scoring methodology: what algorithm turns data into a country tier, why one version replaces another. That's the visible half of the work. The invisible half is the data itself. Every score QGI publishes is computed from a corpus of hand-verified historical events that someone had to collect, classify, and check. If that corpus is wrong, the scoring math is irrelevant.

This post is about what that data work looked like on one particular afternoon for three particular countries.

Why these three

Each of the three had a different kind of problem, and that's why we did them together. Seeing three failure modes side-by-side is more useful than three passes on variants of the same issue.

Iranhad volume but missed the last decade. The existing 146 events were heavy on the Revolution, the Iran–Iraq war, and the Ahmadinejad years, and thin on the post-JCPOA period: the 2017–18 protests, the 2019 fuel-price protests, the 2022 Mahsa Amini unrest, the 2024 presidential transition, the post-October-2023 regional escalation. A scoring engine that doesn't know 2017 happened can't tell you anything useful about 2026.

Ukrainewas thin on everything. Forty-three events total, mostly post-2014. A country where the 20th century alone produced more regime-defining ruptures than most states manage in two hundred years, the 1991 independence, the 2004 Orange Revolution, the 2014 Maidan, Crimea, Donbas, the 2022 full-scale invasion, and we had forty-three entries. That's not coverage. That's a stub.

Venezuela had the most interesting failure: volume was fine (51 events) but the categories were wrong. Chávez-and-Maduro-era Venezuela had exactly one event coded as political_repression in our entire curated history. Zero humanitarian_crisis entries for a country where six million people emigrated. The events existed in the file: RCTV's forced closure, the 2014 SEBIN detentions, the 2017 protest deaths, the 2019 blackouts, the Tun Tun operation, but most were coded as social_unrest or economic_crisis, which understates what was happening. The engine was getting the right country with the wrong labels.

The process

We ran three curation passes in parallel, one per country, each using the same canonical template. The template is the same one we rolled out across all curation work two weeks ago: it inlines the 71-category taxonomy directly in the prompt, which dropped the confabulation rate from roughly 20% (agents inventing plausible-sounding categories that aren't in the schema) to effectively zero when we re-audited. Every candidate event had to come with a verifiable web source, a real date, and a category from the fixed set. We then manually spot-checked each file before syncing to S3 and re-ingesting into the events table.

Total wall time: about an afternoon. Total cost: the API calls, a few dollars.

The results

Iran: 146 → 181 (+35 net after dedup). The additions are dominated by the 2010s protests wave, the 2015 JCPOA and its 2018 unwinding, the 2022 protests, and the 2023–24 regional escalation. Top categories now: civil_war_and_insurgency (21), bilateral_treaty (20), political_repression (16), foreign_intervention (12). The shape now matches Iran as it actually is: a state whose history is dominated by internal security, foreign alignment shifts, and contested domestic legitimacy.

Ukraine: 43 → 85(+42 net, done in two passes). First pass targeted 100 events. It produced 67, and on review we rejected several as not meeting the verification bar: events that happened but couldn't be pinned to a single clear source, or that were already covered by a broader entry. Second pass brought it to 85 at the quality bar we wanted rather than 100 at a softer one. Coverage now spans 1991 independence through the 2024 battlefield situation, with proper weight on the 2004 Orange Revolution, the 2013–14 Maidan sequence, the 2015 Minsk II period, and the 2022–24 full-scale war. Top categories: civil_war_and_insurgency (12), foreign_intervention (7), interstate_war (6).

Venezuela: 51 → 71 (+20 net, with a category rebalance). This was the most interesting of the three because the headline number understates the change. The three high-signal categories that were previously almost empty (political_repression, banking_and_financial_crisis, humanitarian_crisis) went from a combined 2 events to 14. The events are real things that happened to real people: the migration crisis, the bolívar collapse, the ANC period, the 2017 and 2024 detention waves. They were missing not because they were obscure but because the original curation had used softer labels.

Where we didn't hit the target

Ukraine's 85 versus the 100-event target is worth naming. We set the target before we knew what the verifiable-source pool looked like. By the time we got to event 85, further additions would have meant either duplicates of existing entries at finer granularity (three different Donbas offensives instead of one) or events we couldn't cleanly source. Neither of those helps the scoring engine. A broader spread of verified rare events beats a denser cluster of borderline ones.

The same logic applied to Iran's +35 rather than a bigger number: the goal isn't to maximize event count, it's to produce a history the engine can learn from. Events that repeat the same underlying signal at different resolutions add noise, not information.

What this doesn't solve

Three countries curated in an afternoon is not a scalable plan for the 198 countries we want to cover. Three things still missing from this story:

Automation.We don't have a pipeline that pulls candidate events from GDELT or ACLED, auto-classifies them against the taxonomy, and queues the uncertain ones for human review. That's the next-wave curation system. Until it exists, every new country requires a parallel agent pass and human review.
Back-coding.Venezuela's problem, under-labeled categories in an existing file, probably exists in other files we haven't re-audited. Category-stratified retrospective coding is on the roadmap; we haven't started.
Scale.The methodology fix we shelved last week (cross-country normalization) needs closer to 190 curated countries to validate properly. We're at 173 in the curated working set and 138 actually ingested into Athena. The gap between those two numbers is its own piece of work.

Why this matters

The scoring pages on this site talk in terms of z-scores and recipe fusion and tier thresholds. All of that math runs on events like the 185 we added today. If the events are wrong (missing, miscategorized, unsourced, made up), no clever scoring choice can repair it. The math computes faithfully from the input. It doesn't know the input is bad.

Most forecasting products keep their event corpora opaque. Ours isn't opaque, and one of the reasons it isn't is that it can't hide from anybody who wants to look. The files are in the repository. When Iran goes from 146 to 181 events, that's a line-item change in a JSON file anyone can diff. When Venezuela's political_repression count goes from 1 to 5, the five events are individually verifiable and come with URLs.

That transparency is uncomfortable to maintain. It means you can't paper over a thin country by writing a confident-sounding methodology page. It also means when the scoring produces a HIGH tier for Venezuela, there's a documented history underneath the number. Something to argue with, rather than something to take on faith.