Senthilkumar Gopal

Musings of a machine learning engineer

Paradoxes in statistics


Came across this tweet about statistical paradoxes and wanted to learn what they mean.

Absence of evidence fallacy

The absence of evidence fallacy occurs when someone uses a lack of evidence to try to “prove” something. Of course, the problem with this line of reasoning is that a lack of evidence is just that: a lack. Evidence of absence is evidence of any kind that suggests something is missing or that it does not exist.

Reference

Ecological fallacy

A mistake caused by assuming what is true for a group is true for the individual members of the group. (noun) In statistical analysis, an error caused by inferring aggregate data remains true on an individual level.

Reference

Stein’s paradox

Stein’s example (or phenomenon or paradox), in decision theory and estimation theory, is the phenomenon that when three or more parameters are estimated simultaneously, there exist combined estimators more accurate on average (that is, having lower expected mean squared error) than any method that handles the parameters separately.

Reference

Lord’s paradox

When two groups are compared in a pre-post study, two different conclusions can be drawn between the two-sample t-test and the analysis of covariance (ANCOVA). It is known as Lord’s Paradox, and it occurs because the parameter in the two-sample t-test and the parameter of interest in the ANCOVA model are not the same quantity. The difference between the two parameters can be explained by the covariance of linearly combined random variables which is an important topic in introductory statistical theory courses.

Reference

Simpson’s paradox

Simpson’s paradox, which also goes by several other names, is a phenomenon in probability and statistics in which a trend appears in several groups of data but disappears or reverses when the groups are combined. This result is often encountered in social-science and medical-science statistics,[1][2][3] and is particularly problematic when frequency data is unduly given causal interpretations.

Reference

Berkson’s paradox

Berkson’s paradox (also known as Berkson’s fallacy or Berkson’s bias) is the counter-intuitive idea that events which seem to be correlated actually are not. Take two events, A and B, which are completely independent events (for example, lung cancer and diabetes). If a study selects for both the presence of A (lung cancer) and B (diabetes), the presence of diabetes will make the presence of lung cancer more likely. Intuitively, this makes no sense, but the data seems to back this counter-intuitive notion up, showing that there is, in fact, a connection.

Reference

Prosecutors fallacy

tbd

Gambler’s fallacy

The gambler’s fallacy is the belief that the probability for an outcome after a series of outcomes is not the same as the probability for a single outcome. The gambler’s fallacy is real and true in cases where the events in question are independent and identically distributed.

Reference

Lindley’s paradox

Lindley’s paradox is a counterintuitive situation in statistics in which the Bayesian and frequentist approaches to a hypothesis testing problem give different results for certain choices of the prior distribution. It is in fact a difficulty reconciling two paradigms — Bayesian and frequentist statistics.

  • Bayes — probability is a (unique) measure of degree of belief (see e.g., Cox’s theorem in Chap. 2 of Jaynes3)
  • Frequentist — probability is the (asymptotic) frequency at which an outcome occurs, in a hypothetical sequence of repeated trials

Reference | Reference

Low birthweight paradox

The low birth-weight paradox is an apparently paradoxical observation relating to the birth weights and mortality rate of children born to tobacco smoking mothers. Low birth-weight children born to smoking mothers have a lower infant mortality rate than the low birth weight children of non-smokers. It is an example of Simpson’s paradox. Traditionally, babies weighing less than a certain amount (which varies between countries) have been classified as having low birth weight. In a given population, low birth weight babies have a significantly higher mortality rate than others; thus, populations with a higher rate of low birth weights typically also have higher rates of child mortality than other populations.

Reference