Superforecasting: The Art and Science of Prediction, by Philip E. Tetlock and Dan Gardner

After the Iraq WMD disaster, the U.S. joint intelligence community decided to sponsor a tournament to identify best practices in forecasting. After a few years, the top amateur forecasters ended up besting the predictive ability of professional political analysts. These “superforecasters”, though smarter than average, where not superhuman. They thought differently, and we can all use insights gleaned from this experiment to learn how to make better predictions. Tetlock’s ideas are scientifically-founded and well-reasoned. While teaching us about biases in decision-making, he extends it to his own, examining whether his own ideas are biased. I found that to be very thoughtful and thorough. The ideas related to learning how to better think is a skill that can be applied to many areas of decision making, but particularly for me, to investments. Another important idea is that of rigorous measurement—without it we are all “blind men talking about at the color of the rainbow”. Overall, the book contained many useful insights on making predictions.

  • “The average expert is roughly as accurate as a dart throwing chimpanzee (i.e. random guessing).” 
  • The reason why people believe tv forecasters is not because they are accurate—they never have verifiable track records—it is because they can tell a compelling story with conviction. Forecasts are often done for entertainment, comforting, simplification, or politicizing.  
  • The predictability of something depends on what is being predicted, for what amount of time, under which conditions. Weather is predictable, in so far as a few days out. In a complex world, the accuracy of forecasting generally has a limit of about a few years out. 
  • “You can achieve incredible progress if you set a clear goal and find a measure that will drive progress toward that goal.” ­­­­—Bill Gates. Simple, yet amazing how often measurement is not done. 
  • Medicine similarly went through a scientific revolution in the 1920’s, de-emphasizing physician intuition and perceptions, and relying instead on rigorous experimentation and measurements. In the past, doctors fiercely defended silly practices as bloodletting and induced vomiting, for example.    
  • The alternative to controlled experiments that may provide genuine insight are uncontrolled observations that only contain the illusion of insight.  
  • Well-validated statistical algorithms have shown to consistently outperform human subjective judgment. If there is a good and cheap computer algorithm, use it. Subjective judgment is useful in the realm where there are not good statistics. A combination of the two would be the most powerful. 
  • ‘System 1’ thinking is fast and intuitive; ‘System 2’ is slow and deliberate. We are evolved to operate automatically with System 1 (WYSIATI: What You See Is All There Is—availability heuristic), since it helps our survival, but it is less rigorous and accurate than System 2, because it does not fully consider the quality of the available evidence.  
  • Our brains seek a coherent narrative and therefore confirmation bias leads us to avoid doubt and ignore contradictory evidence. To get closer to the truth, question compelling but incomplete “tip-of-the-nose” conclusions, and explore alternative hypotheses.  
  • ‘Blink’ vs. ‘think’, or intuition vs. analysis. Good intuition ‘blink’ accuracy depends on pattern recognition and deep experience in a field that contains valid cues. Good analytical ‘think’ accuracy depends on robust experimentation and statistical confidence in a field that contains ample data. 
  • A forecast should be specific, explicit, and unambiguous. It must contain a timeframe. And since history cannot be re-run, a lot of forecasts are important to establish a track record. Otherwise, it is impossible to judge. 
  • Forecasters of course want to be vague and general, to avoid being on the “wrong side of maybe”.  A clever and untestable forecast can be retroactively interpreted generously. 
  • Brier Score: a measure that combines calibration (accuracy) and resolution (confidence). 
    • 0: perfect omniscience 
    • 0.5: random 50-50 guessing 
    • 2.0: perfect anti-omniscience 
  • Tetlock’s 21-year Expert Political Judgement project: ideological “Big Idea hedgehogs” underperformed versatile, eclectic “foxes”. Hedgehogs see everything through the lens of their “big idea”, distorting their view; foxes see things from multiple perspectives and are doubtful and nimble. Hedgehogs are therefore more reassuring as people are naturally drawn to simple, confident, and compelling narratives. 
  • The “wisdom of crowds” (“the miracle of aggregation”) is startlingly accurate. It harnesses the power of aggregating the good judgements of disparate sources of information while nullifying their (random) bad judgments. Aggregating the judgements of different experts is more powerful if the collective wisdom is larger. Aggregations of aggregations from disparate sources can similarly be even more powerful. Foxes likewise integrate many different models of thinking to synthesize an aggregated conclusion. 
  • “All models are wrong, but some are useful.” —statistician George Box  
  • Good decisions can lead to bad outcomes. Don’t confuse reasonable prospective decisions with incorrect retrospective results.  
  • When it comes to evaluating the accuracy of forecasts, don’t rely on intuition. Rigorously measure. 
  • Superforecasting best practices: 
    • “Fermi-ize” a problem: break it up into smaller digestible pieces and estimate 
    • Outside view: determine base rate—the unconditional probability of an event within the broader class first
    • Inside view: research and assign probabilities to the smaller problems, and adjust the base rate up or down based on the unique circumstances of the problem 
    • Be sensitive to scope: timeframes and degrees of doubt 
    • Avoid cognitive illusions: be aware of thinking traps and biases 
    • Get feedback: ask others to critique your analysis and consider others’ perspectives. Seek diverse and disconfirming views. Discuss and debate points and counterpoints
    • Self-critical: “the crowd within”—step back, wait awhile, assume your original estimate was wrong, then reassess.
    • Update: re-evaluate based on incoming information without over-/ under- reacting/ confident 
  • Superforecaster characteristics: 
    • Personality: self-improving (most important), intellectually curious (high need for cognition), humble, cautious, grit 
    • Abilities: numerate, analytical, above average intelligence 
    • Thinking: actively open-minded, reflective, probabilistic, secular, thoughtful updater 
  • People prefer certainty. They value 0% and 100%, or confident yes/ no much more than in-between ‘maybe’ or probabilities. Probability and statistics are not commonly intuitive. 
  • In a complex world, nothing is 100% certain. Even science, a process that aims to gradually accumulate evidence to dispel uncertainty, still arrives at conclusions that can only be tentative, because disconfirming evidence, though improbable, may emerge. 
  • Two types of uncertainty: 
    • epistemic: knowable unknowns 
    • aleatory: unknowable unknowns 
  • Religion may be anathema to probabilistic thinking (psycho-logic vs. logic). Religious people believe in fate and assigning retrospective meaning (narrative illusion and hindsight bias, but also psychologically healthy). Scientists believe in indeterminate chance and pointless randomness. 
  • Belief perseverance: confirmatory bias and ego-protection that makes a person refusing to change their views in light of new disconfirming information that contradicts with their self-identity or worldview (like removing a base piece from Jenga puzzle). 
  • Teams can do good (wisdom of crowds, diverse views) or bad (groupthink, cognitive loafing, madness of crowds). They must be managed properly. Aim for independent judgment, engaged members, constructive debate, information sharing, conduct a “pre-mortem”.
  • Individual ordinary forecasters < wisdom of the crowd <= teams of ordinary forecasters < prediction markets < team of superforecasters < team of forecasters + extremizing (simple) algorithm. “Diversity trumps ability”– the extremizing diverse (expert) forecasts harnesses that.  
  • A leader is resolute and confident, which contradicts with the necessary qualities for a superforecaster—uncertain, doubting, self-critical, and humble. The trick is to hold intellectual humility yet still act with confidence. 
  • Do not conflate facts with values. Something you despise may nonetheless have impressive qualities.  
  • Dwight Eisenhower: “Plans are useless, but planning is indispensable.” Reality is too complex, but the exercise of planning is essential.  
  • Like medicine, other fields are going through an evidence-based, quantitative, analytical revolution—sports, charities, even politics.  
  • While numbers aren’t perfect, they can be a useful tool. Measure, analyze, keep score! Learn from the results feedback.

Finished: 24-Feb-2019. Rating: 9/10.