The Kristi Noem Forecast: How AI Underpriced Transactional Loyalty

On January 30, 2026, I ran a prediction through Perspectives: "Kristi Noem out by March 31?"
The system returned a 23% probability. Eight forecasting personas debated the question, interrogated each other's reasoning, voted, and produced a report concluding that Noem would almost certainly survive (at least until March 31st). The Insider won the debate with the lowest estimate of all eight personas (5-15%), arguing that Trump's public loyalty and Noem's policy alignment made removal implausible.
On March 5, 34 days later, Trump fired Noem as Secretary of Homeland Security.
This is the second prediction I've been able to evaluate against reality (the first covered three Iran-related forecasts published in a previous post). The pattern is becoming clearer, and it points to a specific structural weakness in how the system aggregates predictions.
What Actually Happened
The system ran its analysis on January 30. At that point, Noem was under pressure from the Minneapolis shooting controversy (where two U.S. citizens were murdered by federal agents during an immigration operation) and facing bipartisan Senate criticism. Trump had publicly defended her just three days earlier, on January 27, saying she was doing a "very good job."
Over the following five weeks, the situation escalated. Noem was called to testify before both the Senate and House Judiciary Committees in early March (just a few days ago at the time of writing this post). During those hearings, she faced hostile questioning from both parties over immigration enforcement tactics, a $220 million ad campaign featuring herself, and allegations that her department had obstructed the Inspector General's office. Trump was reportedly "incensed" by her performance during the hearings.
On March 5, Trump announced Noem's removal and named Senator Markwayne Mullin as her replacement, effective March 31. An administration official cited "a culmination of her many unfortunate leadership failures" including the Minneapolis fallout, the ad campaign, allegations of infidelity, staff mismanagement, and feuding with other agency heads. She was offered a consolation role as Special Envoy for a new Western Hemisphere security initiative.
What the System Predicted
The eight Forecaster personas produced a wide range of estimates. Seven of the eight clustered between 10% and 25%. The Risk Analyst was the clear outlier at 65-80%.
| Persona | Probability Range | Midpoint |
|---|---|---|
| The Risk Analyst | 65-80% | 72% |
| The Trend Analyst | 15-25% | 20% |
| The Scenario Planner | 15-25% | 20% |
| The Systems Thinker | 10-25% | 18% |
| The Base Rate Analyst | 10-20% | 15% |
| The Contrarian | 10-20% | 15% |
| The Sceptic | 10-20% | 15% |
| The Insider | 5-15% | 10% |
The aggregate of 23% reflected the strong consensus among the majority. The Risk Analyst's high estimate pulled the average up, but the weight of agreement pushed the final number firmly into "unlikely" territory.
Where the System Got It Right
The system identified nearly every factor that contributed to Noem's removal.
The Risk Analyst flagged the Minneapolis shooting as a "systemic liability" and argued that operational failures would make Noem "politically toxic." That assessment proved accurate. The Scenario Planner mapped three branching paths and acknowledged that a "political pivot" scenario (where Trump cuts losses to redirect the news cycle) was plausible. The Contrarian identified Trump's "transactional loyalty" as a key variable, noting he could turn on allies when they became liabilities, this is something we have seen many examples of from Trump. Several personas identified the potential for Senate criticism to work against Noem, which is exactly how the final weeks played out.
The system's "What Would Change These Predictions" section is almost a checklist for what happened: Trump publicly distancing himself, media narrative escalation, internal White House frustration with Noem, and policy clashes between Noem and the administration (specifically, Noem telling Congress that Trump had approved the ad campaign, which Trump then publicly denied).
Where the System Got It Wrong
The system correctly identified the variables. It failed to weight them properly.
Seven of eight personas treated Trump's January 27 public defence as a strong protective signal. "When Trump steps in front of a camera and explicitly says someone is doing a 'very good job' and 'won't step down' amid a PR crisis, he is telling us he is not looking for an exit," The Insider argued. This reasoning won the debate. It was also wrong.
That statement turned out to only be valid for around 5 weeks. The system treated it as a durable indicator of loyalty. In reality, it was a snapshot of a position that quickly shifted. The Scenario Planner actually conceded this vulnerability during the interrogation phase, admitting that anchoring to a 72-hour-old statement to predict a 60-day horizon was "false precision." But this concession didn't move the broader consensus.
The second failure is more systemic. The system correctly identified that Senate criticism could escalate into a genuine threat, but most personas dismissed it as "political theatre" or "noise." The Base Rate Analyst argued that Senate criticism alone had rarely forced a Cabinet departure. The Contrarian argued it would cause Trump to "dig in" on loyalty. Both assessments were wrong. The Senate hearings in March appear to have been the proximate trigger for Noem's removal.
The Outlier Was Right
The Risk Analyst predicted a 65-80% probability of Noem's departure. This was the only estimate in the correct range.
The Risk Analyst's core argument was that the "covariance of risks" (multiple overlapping problems arriving simultaneously) made Noem's position fragile, and that Trump's public defence was a "lagging indicator," a temporary holding pattern while political damage was assessed. This assessment was correct. The convergence of the Minneapolis fallout, the ad campaign controversy, the Inspector General obstruction allegations, and the disastrous congressional hearings created precisely the kind of multi-factor collapse the Risk Analyst described.
During the interrogation phase, three challengers tested the Risk Analyst's reasoning. Two challenges resulted in disputed verdicts, meaning the challengers and the Risk Analyst couldn't reach agreement. One challenge was defended outright. The Risk Analyst held their position under pressure. The rest of the system voted against them anyway.
Calibration Patterns
This is the second resolved prediction where the same pattern appears. In the Iran predictions, cautious analyses won votes while aggressive predictions proved more accurate. The system correctly identified key variables but underweighted their potential impact and failed to model how they could interact.
Specifically, two recurring issues:
The system underweights compounding risk. When multiple negative factors exist simultaneously, the probability of removal increases faster than a simple addition of individual risks would suggest. The Minneapolis shooting alone might not have been enough. The ad campaign alone might not have been enough. The congressional hearings alone might not have been enough. Together, they created a cascade. The Risk Analyst modelled this interaction. The other seven personas treated each factor more or less independently.
The voting mechanism amplifies consensus. When seven personas agree on a low probability and one persona disagrees, the STV voting system produces a result that reflects the majority view. The Insider won because most personas found the "Trump loyalty" argument convincing. The Risk Analyst's counterargument, that loyalty has an expiry date, was structurally disadvantaged in the voting. This is a known design tension: the voting system is meant to surface the most convincing argument, but "convincing" and "accurate" can diverge, particularly when the majority shares the same analytical blind spot.
What This Means for Perspectives
These results reinforce a finding from the Iran retrospective: prediction accuracy may benefit from mathematical corrections in the aggregation layer. The persona prompts should remain unchanged. The personas identified the right factors and the right risks. The aggregation underweighted the outlier.
The improvement path is likely to track persona-level accuracy over time and adjust weighting based on demonstrated performance in specific types of predictions. For example, if the Risk Analyst consistently outperforms on political removal predictions, the system should weight their estimates higher in that category.
The practical next step is building enough resolved predictions to calculate per-persona Brier scores (a standard accuracy metric for probabilistic forecasts) and establishing whether this pattern holds across a larger sample. Two predictions showing the same pattern is suggestive rather than definitative.
Resolution Summary
| Prediction | Reality | |
|---|---|---|
| Aggregate probability of departure | 23% | Departed (March 5) |
| Most confident persona (retention) | The Insider: 5-15% | Wrong |
| Most confident persona (departure) | The Risk Analyst: 65-80% | Closest to reality |
| Debate winner | The Insider | Wrong |
| Key factors identified | Yes (all major factors present) | Confirmed |
| Factor weighting | Underweighted | Confirmed pattern |
| Time to resolution | 34 days from prediction | Within timeframe |
You can read the full prediction report generated by Perspectives and the full debate.
Read my previous breakdown of the predictions generated by Perspectives about Iran.
Escape the echo chamber.



