Three Predictions on Iran: What Perspectives Got Right, Got Wrong, and Couldn't See Coming

On January 28, 2026, we ran three related predictions through Perspectives, a multi-agent forecasting system that uses AI personas to debate the likelihood of future events. The questions were all focused on Iran over a short time horizon:

Will Khamenei be out as Supreme Leader of Iran by March 31?
Will Israel strike Iran by March 31, 2026?
Will the Iranian regime fall by March 31?

Exactly one month later, on February 28, a joint US-Israeli military operation struck Iran. Supreme Leader Ayatollah Ali Khamenei was killed in an Israeli airstrike on his Tehran compound. Iranian state media confirmed his death the following day.

Two of the three questions have now resolved. This article examines how the system handled each prediction: what the personas argued, where they agreed and disagreed, what reasoning held up, and where it failed. The goal is transparency about the system's current capabilities and limitations, with a focus towards improving calibration.

How the System Works

Each prediction ran through Perspectives' forecasting pipeline using the Forecaster persona set: eight analytically distinct personas, each approaching the question from a different angle. The personas are: The Base Rate Analyst, The Contrarian, The Insider, The Risk Analyst, The Scenario Planner, The Sceptic, The Systems Thinker, and The Trend Analyst.

The workflow proceeds in phases. First, the system conducts background research, running web searches to establish shared factual context. Individual personas then run additional searches to gather evidence for their own arguments.

After research, each persona independently writes a blind proposal containing their analysis and a probability estimate. Blind proposals are critical to the system's design. No persona sees another's work before committing their own position. This prevents early arguments from pulling everyone else toward the same conclusion, and forces analytical diversity.

Once proposals are locked in, the system enters structured interrogation. Each persona has their analysis challenged three times by other personas, who probe for weaknesses: whether their confidence is well-calibrated, whether they've accounted for unlikely scenarios, whether historical patterns support their reasoning, and whether events might unfold differently than they expect. Each challenge receives a response from the original author, who can either concede the point or defend it. If they defend, the challenger decides whether the defence held up.

Finally, the personas vote using ranked-choice balloting to select the most convincing overall analysis. The system aggregates all probability estimates into a final prediction.

Debate 1: Khamenei Out as Supreme Leader by March 31?

Aggregate prediction: 30% likelihood of Khamenei leaving office by March 31, 2026.

Actual outcome: Khamenei was killed on February 28, 2026, in an Israeli airstrike on his Tehran compound. Confirmed by Iranian state media on March 1. This question resolves YES.

What the Personas Argued

The system ran twelve web searches for this debate, covering Khamenei's health reports, succession planning, and the constitutional provisions for removing a Supreme Leader.

Most personas clustered between 5% and 25%, predicting Khamenei would remain in office. Their reasoning converged on a few key points. The historical record showed only one leadership transition in 46 years. The succession deadlock around Khamenei's son Mojtaba (who lacks the required clerical rank for the role) meant the regime had every incentive to prop up the existing leader. And there was evidence of continued agency: CNN reported on January 17 that Khamenei had publicly addressed the nation. The Sceptic put it plainly: "You do not give speeches ordering security forces to crush dissent if you are on a ventilator."

The Trend Analyst (10-20%), who eventually won the ranked-choice vote, reinforced this reading. The scale of the crackdown (reports of up to 30,000 protesters killed) served as evidence of operational control: orchestrating violence at that scale requires active leadership.

The Insider (15-25%) reported that contacts in Tehran suggested a shift to "managed stasis," with Khamenei's schedule being planned months in advance. During interrogation, The Sceptic caught an inconsistency: the Insider's written argument implied near-certainty that Khamenei would stay, yet their probability range (15-25% chance of leaving) only represented 75-85% confidence. The Insider conceded the error. This is a case where the interrogation process caught and corrected a persona's own internal contradiction.

Two personas predicted much higher probabilities. The Risk Analyst (65-80%) argued that Khamenei's July 2025 public appearance was staged to prove he survived the Twelve-Day War, and that after it, he vanished again for seven months. They cited reports suggesting Khamenei was in a coma. The Systems Thinker (80-90%) went further, stating outright: "Biology is linear and unforgiving; at 86, he's likely dead or incapacitated." Yet their written argument also claimed the system would never declare him out because doing so would trigger the succession crisis. Their probability captured the biological reality, while their reasoning pointed in the opposite direction. This tension reveals a limitation in how the system reconciles numbers with reasoning.

The Interrogation

The debate generated 24 challenges: 4 concessions, 3 defences, and 17 disputes.

The most productive challenge came from The Insider targeting The Sceptic. The Insider argued that the January 17 televised address lacked the spontaneity of a live broadcast and that operational orders were bypassing Khamenei's office, flowing directly through his son and the IRGC (Iran's military-political force). The Sceptic conceded, acknowledging they had treated the televised address as solid evidence when, in a regime known for media manipulation, that was an assumption.

The Scenario Planner challenged The Base Rate Analyst on their treatment of mortality risk. Dividing an annual mortality rate by six assumes risk is spread evenly across the year, but medical emergencies are sudden. An overnight stroke doesn't care about averages. The Base Rate Analyst conceded the blind spot.

The Systems Thinker faced three challenges and failed to respond to any of them, triggering content filter errors on all three attempts. This meant one of the two highest-probability personas (80-90%) was never challenged. In a debate that ultimately resolved in the direction their probability indicated, the absence of stress-testing matters. If their reasoning had been interrogated and survived, it would have pulled the aggregate upward. If it had been interrogated and conceded, the aggregate would better reflect the consensus.

The Vote

The ranked-choice vote eliminated personas over eight rounds. The Trend Analyst won in the final round. Their argument, that the crackdown evidenced continued operational control and the succession deadlock preserved the status quo, was selected as the most convincing analysis.

What Actually Happened

The Trend Analyst's victory, and the aggregate prediction of 30%, reflected a reasonable assessment of the information available on January 28. Most personas correctly identified the succession deadlock, IRGC loyalty, and regime survival instinct as stabilising factors. These were sound analytical inputs for the scenarios they modelled.

What the system could not account for was outside intervention at the scale that occurred. The Scenario Planner explicitly modelled assassination or military strike and assigned it only 5%, noting that "direct strikes on the Supreme Leader are extreme escalation risks that even Israel has historically avoided." This reasoning was defensible on January 28. The joint US-Israeli operation on February 28 was, by historical standards, unprecedented.

The Risk Analyst and Systems Thinker assigned higher probabilities (65-80% and 80-90%), but both were modelling biological collapse, not military action. The Risk Analyst's reasoning about Khamenei being in a coma was wrong about the how, even if their probability was closer to the actual outcome.

Debate 2: Israel Strikes Iran by March 31, 2026?

Aggregate prediction: 44% likelihood of Israel striking Iran by March 31, 2026.

Actual outcome: On February 28, 2026, the United States and Israel launched a joint military operation against Iran, including Israeli missile strikes across Tehran and other locations. This question resolves YES.

This debate had the most formally specified resolution criteria of the three, defining a qualifying strike as "the use of aerial bombs, drones or missiles launched by Israeli military forces that impact Iranian ground territory or any official Iranian embassy or consulate." Intercepted missiles would not qualify.

What the Personas Argued

The system ran sixteen web searches for this debate, the highest of the three, covering Israeli emergency munitions procurement, Netanyahu's January 2026 statements, US carrier deployment, and Iran's nuclear breakout timeline.

The debate split into two clear camps: those who thought a strike was likely, and those who thought it was unlikely.

Personas predicting a strike was likely (above 50%):

The Insider (80-90%) pointed to a $183 million emergency procurement contract with Elbit Systems signed on January 27, Netanyahu's warning of force Iran "has never seen," and the arrival of the USS Abraham Lincoln carrier strike group. When challenged that procurement lead times made imminent use unlikely, The Insider countered: "You're reading the contract; I'm watching the loading docks. This is 'emergency procurement,' which draws down existing stockpiles immediately."

The Systems Thinker (70-85%) argued that the June 2025 Operation Rising Lion had proven the military option worked, creating a self-reinforcing pattern: successful action reduces the perceived cost of future action. The Contrarian (60-75%) agreed, arguing the June operation was a "proof of concept" and that with Iranian air defences degraded and enrichment continuing, the strategic logic had shifted from deterrence to active degradation.

Personas predicting a strike was unlikely (below 30%):

The Sceptic (15-25%) argued from post-conflict exhaustion: seven months is not enough to rebuild after a full campaign. They characterised Netanyahu's rhetoric as standard deterrence posturing. The Trend Analyst (15-25%) offered a different reading of the carrier deployment: rather than signalling an imminent attack, the carrier served as a security buffer that reduced the need for Israeli military action. The Base Rate Analyst (10-25%) pointed out that historically, countries very rarely launch a second major air campaign within months of the first.

The Risk Analyst (35-45%) fell between the two camps, acknowledging the conditions for a strike while noting that conditions and decisions are different things. They were the only persona to successfully defend all three of their interrogation challenges.

The Interrogation

This debate produced the highest tension of the three: 24 challenges with 0 concessions, 9 defences, and 15 disputes. Zero concessions is significant. No persona was willing to admit a fundamental flaw in their reasoning. The result was deeply entrenched disagreement, which the aggregate (44%) reflects.

The most revealing exchange came between The Sceptic and The Contrarian. The Contrarian challenged The Sceptic's assumption that stability would hold, arguing they were confusing practical constraints with political ones: "Netanyahu might strike because he has nothing left to lose." The Sceptic drew a distinction between political intent and military capability, arguing that even if Netanyahu wanted to strike, material constraints imposed a real floor on the timeline. The verdict was "disputed," which in hindsight captures the genuine uncertainty well: the political intent was there, and the capability turned out to be sufficient, though it required a joint operation with the United States, something the debate never modelled.

The Vote

The Sceptic won the ranked-choice vote, collecting support from the more cautious personas. This is notable: the system's voting process selected the analysis arguing against a strike, and the strike happened.

What Actually Happened

The aggregate of 44% was closer to reality than most individual estimates, but the mechanism was wrong. No persona modelled a joint US-Israeli operation. The debate framed the question as an Israeli decision constrained by logistics, US diplomatic pressure, and political capital. The actual event was a coordinated campaign with explicit US participation, which removed the capability constraints several personas relied on.

The Insider's emphasis on procurement signals and the carrier deployment proved directionally correct, though even they framed it as US support enabling an Israeli decision, not a jointly planned operation. The Trend Analyst's reading of the carrier as a dampener turned out to be wrong.

The Sceptic's argument about post-conflict exhaustion was reasonable on its own terms: Israel alone might struggle to mount another operation so quickly after June 2025. What they missed was that US participation removed that constraint entirely.

Debate 3: Will the Iranian Regime Fall by March 31?

Aggregate prediction: 16% likelihood of regime collapse by March 31, 2026.

Actual outcome: As of March 1, 2026, the regime has not fallen. A three-person interim leadership council (consisting of the president, the chief of the judiciary, and a jurist of the Guardian Council) has assumed temporary authority following Khamenei's death. The IRGC chain of command appears to have been severely damaged, with reports indicating the defence minister, the commander of the Revolutionary Guard Corps, and the secretary of the Iranian Security Council were killed in the strikes. The situation remains deeply unstable and this question is unresolved.

What the Personas Argued

This debate ran the most web searches of the three (twenty-one), reflecting the breadth of variables involved: economic data, protest dynamics, IRGC cohesion, succession planning, and historical patterns of regime collapse.

Every persona agreed on a single pivot point: the cohesion of the IRGC. Without a fracture in the security forces, the regime would survive regardless of other pressures. They diverged on how likely that fracture was.

Most personas clustered between 1% and 25%. The Base Rate Analyst (1-5%) presented the lowest estimate: once the regime demonstrated it could carry out mass killings without internal defections, the historical precedent for a 60-day collapse drops to near zero. The Contrarian (10-20%) argued that protesters lack weapons and two months is too short for uprisings to topple a military-backed authoritarian regime. The Insider (10-20%) provided the most detailed picture: capital flight patterns (private jet traffic at three-year highs from Tehran to Dubai and Istanbul), IRGC command centres relocated to Qom, and Basij paramilitary units experiencing delayed pay and food shortages. They argued the regime was in a "bloody stabilisation" phase that could sustain it for months or years.

The Risk Analyst (15-25%) focused on the Supreme Leader as a single point of failure: "In a system built around absolute authority, the death of the centre creates a vacuum that fills with chaos faster than the guards can contain it." They successfully defended this position against The Systems Thinker's challenge, arguing that the IRGC could only self-correct if there was a clear institution or successor to rally around, and there wasn't one.

The Trend Analyst (25-40%) assigned the highest probability, citing the accelerating pace of deterioration: currency collapse, escalating violence, and reports of IRGC war fatigue. They argued that historical comparisons fail when the rate of change is this extreme.

The Interrogation

Twenty-four challenges produced 3 concessions, 5 defences, and 16 disputes. The concessions show the system correcting its own errors in real time.

The Scenario Planner conceded to The Sceptic that a single public appearance by Khamenei was insufficient to justify near-certainty in regime survival. The Systems Thinker conceded to The Sceptic that their model was fragile: if the reported death toll represented decentralised panic rather than deliberate strategy, it would actually indicate a breakdown in command, not evidence of control. The Sceptic conceded to The Base Rate Analyst that they had failed to anchor against the broader historical record of authoritarian survival.

The Vote

The Sceptic won the ranked-choice vote for the second time across the three debates, collecting broad support for their balanced assessment of threat and resilience.

What Is Happening Now

The regime has not fallen as of this writing. However, the conditions have shifted dramatically. The joint US-Israeli strikes killed Khamenei, the defence minister, and the commander of the Revolutionary Guard Corps, the very institution every persona identified as the key variable. The operation explicitly aimed at regime change, with President Trump urging Iranians to "take over your government" and Netanyahu stating the goal was to "create the conditions for the brave Iranian people to take their destiny into their own hands."

The Risk Analyst's framing of the Supreme Leader as a "single point of failure" has been directly tested. The Insider's observations about capital flight and parallel command structures now read as prescient context for the current succession crisis. The Contrarian's argument that protesters "lack guns" remains structurally true, but external military intervention has introduced a variable none of the personas modelled: the systematic destruction of the regime's military command structure from outside.

Whether the regime survives in some form, transitions to military rule, or collapses entirely remains to be seen. The 16% aggregate may prove to have been too low, but as of March 1, the regime's formal institutions still exist, even if severely damaged.

Calibration Analysis: Patterns Across All Three Debates

A single prediction resolving one way or another does not tell you whether a system is well-calibrated. A system that predicts 30% should see the event happen roughly 30% of the time. You cannot evaluate that from one instance: a 30% event happening is not evidence of a mistake. The value of examining these predictions lies in the reasoning patterns, not in grading individual numbers against outcomes.

That said, three related predictions on the same geopolitical situation do reveal structural tendencies in how the system reasons.

Pattern 1: The System Systematically Underweighted Outside Intervention

Across all three debates, the dominant analytical framework was internal dynamics: IRGC cohesion, succession planning, economic pressure, protest momentum, biological health. The possibility of a large-scale external military operation was discussed but consistently pushed to the edges of the probability distribution. The Scenario Planner gave assassination or military strike only 5% in the Khamenei debate. No persona in the Israel strikes debate modelled joint US-Israeli operations. The regime fall debate never considered a scenario where external strikes would destroy the military command structure.

The system's personas analyse situations from within their respective frameworks, and those frameworks tend to assume that the actors under study are the primary drivers of outcomes. When the decisive actor is external and operating at a scale that redefines the situation entirely, the system underweights it.

A possible improvement: a dedicated challenge dimension in the interrogation protocol that forces each persona to stress-test their analysis against outside interventions, regardless of how unlikely they seem.

Pattern 2: The Base Rate Analyst Consistently Provided a Floor, Not a Forecast

Across all three debates, The Base Rate Analyst assigned the lowest probabilities: 5-12% for Khamenei's departure, 10-25% for an Israeli strike, and 1-5% for regime collapse. Their methodology (start with the historical frequency of similar events, then adjust for current specifics) produced conservative estimates that consistently undershot the actual outcomes.

The interrogation protocol exposed a core weakness twice. The Scenario Planner and The Risk Analyst both forced concessions by pointing out that statistical averages break in the presence of sudden events. Dividing an annual mortality rate by six assumes risk is spread evenly across the year. Medical emergencies and missile strikes don't work that way.

This suggests the system would benefit from tracking each persona's accuracy over time. If The Base Rate Analyst consistently anchors too low on questions involving potential sudden changes, the aggregation methodology could account for that.

Pattern 3: The Sceptic Won the Vote, But Higher Estimates Were Closer

In the Khamenei and Israel strikes debates, The Sceptic or Trend Analyst won the ranked-choice vote with moderate-to-low probability estimates. The personas with higher estimates (The Insider, The Risk Analyst, The Systems Thinker) were closer to the actual outcomes in probability terms, but their reasoning about mechanisms was often wrong (biological collapse rather than assassination, a unilateral Israeli strike rather than a joint operation).

This creates a puzzle. The voting mechanism rewards the most defensible analysis, not the most accurate prediction. The Sceptic's arguments were well-constructed, evidence-based, and logically coherent. The Insider's arguments relied on unverifiable intelligence claims and aggressive extrapolation. In any given instance, the Sceptic's methodology is more robust. Over time, though, if cautious methodologies consistently win votes while more aggressive ones consistently get closer to outcomes, the voting process may be introducing a conservative bias to the final predictions.

A possible improvement: decoupling the "most convincing analysis" vote from the probability aggregation. The vote could assess reasoning quality while probability estimates are weighted independently, perhaps factoring in each persona's historical accuracy.

Pattern 4: Content Filter Failures Created Analytical Gaps

In the Khamenei debate, The Systems Thinker hit content filter errors on all three interrogation responses. This meant one of the two highest-probability personas (80-90%) was never challenged. In a debate that ultimately resolved in the direction their probability indicated, the absence of stress-testing matters.

Content filtering is an inherent constraint when running sensitive geopolitical predictions through LLM providers. The system cannot currently route around these failures. This limitation disproportionately affects the kinds of predictions that are most useful to evaluate: those involving conflict, political violence, and regime stability.

Pattern 5: The System Correctly Identified the Key Variables, But Misjudged Their Interaction

Across the three debates, the system identified nearly every variable that mattered: Iran's nuclear enrichment trajectory, Israeli military preparedness, US carrier deployment, IRGC cohesion, Khamenei's health, the succession deadlock, economic collapse, and protest dynamics. These were not obscure factors. But the system analysed each in relative isolation, or within the scope of individual persona frameworks.

What no persona modelled was the way these factors could combine: that Israeli military preparation, US naval deployment, Iranian internal instability, and the succession crisis could come together in a single coordinated action designed to exploit all of them simultaneously. The actual event was a joint operation targeting nuclear facilities, military infrastructure, and political leadership in one campaign. The system treated each variable as an input to its own probability estimate. Reality treated them as components of a single integrated plan.

This points toward a deeper architectural question: whether multi-agent debate, in its current form, is capable of modelling connected risks across geopolitical domains, or whether the persona-by-persona structure inherently separates interconnected variables.

Summary of Aggregate Predictions vs. Outcomes

Debate	Aggregate Prediction	Outcome	Persona Range
Khamenei out by March 31	30%	Yes (killed Feb 28)	5% to 90%
Israel strikes Iran by March 31	44%	Yes (Feb 28)	10% to 90%
Iranian regime falls by March 31	16%	Unresolved (destabilised)	1% to 40%

The wide persona ranges (5-90%, 10-90%, 1-40%) reflect uncertainty that the aggregation process compressed. Whether that compression was appropriate depends on future calibration data. On these three questions, the aggregate trended toward the cautious end of the distribution, influenced by the voting process selecting careful, well-defended arguments over more aggressive probability estimates that ultimately landed closer to the actual outcomes.

A Note on the Human Cost

This article has discussed events in analytical terms - probabilities, persona frameworks, prediction accuracy. That framing should not obscure what actually happened on February 28 and the weeks preceding it.

The joint US-Israeli strikes killed at least 201 people across 24 Iranian provinces, according to the Iranian Red Crescent. Among the targets, Israeli strikes hit the Shajareh Tayyebeh girls' elementary school in Minab, killing at least 168 people, most of whom were children. Iran has retaliated with missile and drone strikes, including a ballistic missile that struck a residential area in Beit Shemesh, killing at least 9 people. The conflict continues to escalate.

This follows a period during which the Iranian regime itself carried out mass killings of its own citizens. Human rights organisations estimate that over 7,000 people were killed in the regime's crackdown on protests that began in late December 2025. Reports indicate these killings were carried out under direct orders from Khamenei, with reports of up to 36,000 protesters killed in a two-day span in early January. A nationwide internet blackout accompanied the violence.

Forecasting systems analyse the probabilities of events like these. They cannot capture what those events mean for the people living through them.

These debates were created using Perspectives. The system is under active development, and this analysis is part of the ongoing effort to improve calibration and identify systematic biases.

Three Predictions on Iran: What Perspectives Got Right, Got Wrong, and Couldn't See Coming

How the System Works

Debate 1: Khamenei Out as Supreme Leader by March 31?

What the Personas Argued

The Interrogation

The Vote

What Actually Happened

Debate 2: Israel Strikes Iran by March 31, 2026?

What the Personas Argued

The Interrogation

The Vote

What Actually Happened

Debate 3: Will the Iranian Regime Fall by March 31?

What the Personas Argued

The Interrogation

The Vote

What Is Happening Now

Calibration Analysis: Patterns Across All Three Debates

Pattern 1: The System Systematically Underweighted Outside Intervention

Pattern 2: The Base Rate Analyst Consistently Provided a Floor, Not a Forecast

Pattern 3: The Sceptic Won the Vote, But Higher Estimates Were Closer

Pattern 4: Content Filter Failures Created Analytical Gaps

Pattern 5: The System Correctly Identified the Key Variables, But Misjudged Their Interaction

Summary of Aggregate Predictions vs. Outcomes

A Note on the Human Cost

Comments

More from this blog

The Panel

Two Systems, One Truth: Why the Best Forecasters Will Use Both AI and Financial Markets

Developing a Writing Style with Claude (Update)

The Kristi Noem Forecast: How AI Underpriced Transactional Loyalty

Command Palette

How the System Works

Debate 1: Khamenei Out as Supreme Leader by March 31?

What the Personas Argued

The Interrogation

The Vote

What Actually Happened

Debate 2: Israel Strikes Iran by March 31, 2026?

What the Personas Argued

The Interrogation

The Vote

What Actually Happened

Debate 3: Will the Iranian Regime Fall by March 31?

What the Personas Argued

The Interrogation

The Vote

What Is Happening Now

Calibration Analysis: Patterns Across All Three Debates

Pattern 1: The System Systematically Underweighted Outside Intervention

Pattern 2: The Base Rate Analyst Consistently Provided a Floor, Not a Forecast

Pattern 3: The Sceptic Won the Vote, But Higher Estimates Were Closer

Pattern 4: Content Filter Failures Created Analytical Gaps

Pattern 5: The System Correctly Identified the Key Variables, But Misjudged Their Interaction

Summary of Aggregate Predictions vs. Outcomes

A Note on the Human Cost

Comments

More from this blog