How to Lie With Statistics, by Darrell Huff

  • A statistic is only as good as the sample it is drawn from. Assume that a sample has biases, because truly random samples are expensive and very difficult to obtain and therefore rare. Is the sample representative? Is it large? Are the results significant? What is the sampling error?
  • The word “average”: it can signify the mean, median or mode. Unless the distribution is normal, unspecified uses of “average” may be used to mislead. Mean and median can be substantially different in skewed distributions (e.g. income). Must consider the distribution: range, skewness, reliability (standard error, confidence interval), etc.
  • “A difference is a difference only if it makes a difference”. Statistically significant does not necessarily equal practically significant.
  • Charts can be used to deceive, they may be doctored for “eye appeal”. Pay attention to the scale.
  • Beware the use of percentages—examine the base of the calculation.
  • Correlation does not equal causation. Cause and effect of two things: 1) it may be random (specious), 2) the cause and the effect may actually be reversed, 3) the cause may be an entirely different thing. Consider the scenarios and possible biases.
  • Five questions to ask:
    1. Who says so? Look for conscious and unconscious biases.
    2. How does he know? Look for properly conducted tests.
    3. What’s missing? Look for small sample sizes, distribution, statistical significance.
    4. Did somebody change the subject? Look for switching of base or cause and effect.
    5. Does it make sense? Subject statistics to common sense.

Finished: 16-Oct-2010