Senthilkumar Gopal

Musings of a machine learning researcher, engineer and leader

review-of-p-value


p-value is one of the most commonly used statistical test and value used for experimentation. The standard definition of p-value is the probability that the null hypothesis is true. p-value represents the probability that the world (created with math equations), gives evidence supporting the null hypothesis i.e., p-value shows how consistent the data is with the null hypothesis. So a lower p-value, ridicules the null hypothesis while a large p-value gives no reason to change the default action based on the null hypothesis.

Drug Test

Using [1] as reference, in a Drug test between A and B, the null hypothesis is that both Drugs A and B are the same. So a low p-value shows that these two drugs are different, defeating the null hypothesis. Typically a p-value of 0.05 is used as a threshold, though this is arbitrary. A p-value of 0.05 means that on multiple runs of the experiment, only 5% or less times would the null hypothesis would be true, that both the drugs are same.

  • Null Hypothesis: The drug are the same and patients react the same way
  • Alternate Hypothesis: The drugs are dissimilar and cures the disease with varying degrees

Computing p-value

As referenced from [2], a different test is conducted where the same drug A is being given to two different groups. Null Hypothesis: The drug has no effect and groups would have different reactions Alternate Hypothesis: The drug cures the disease and groups would be similar

As per the null hypothesis, the p-value would be higher as the assumption is that both groups have been given the same drug and are getting cured and hence there are no differences between these two groups. Multiple runs might give a higher p-value proving that the groups are cured and the effect of the drug A are same.

But due to pure random effect, if the p-value of two groups having the same drug, is small, say p=0.01, then it is a False Positive of the Null Hypothesis. As our intent is to break the null hypothesis, this particular experiment disproves the experiment and confirms the null hypothesis for this particular round of experiment.

So with multiple experiments “A p=0.05 threshold means that 5% of the experiments, where the differences come from random things, will generate a p-value < 0.05”

Using this statement, for the test with Drug A vs Drug B, a p-value of < 0.05 would mean that there is no difference between Drug A and Drug B, since the different reactions might be just random. ie., we will allow up to 5 False Positives in 100 experiment runs, to prove that Drug A is different that Drug B. Any more false positives than 5, proves that the null hypothesis is true based on this threshold. Hence it is important to determine this p-value or threshold before running the experiments to prevent being biased by the generated data.

For a stricter threshold, p=0.0001 might be used as well, where only 1 false positive is allowed in 10,000 experiments.

Compute the difference

Though, a p-value helps decide if the null hypothesis is true or not, it does not provide a mechanism to determine how dissimilar the drugs are. It is important to remember that p-value determines the probability of the null hypothesis, but not the scale of difference in the candidates of the experiment.

References (1) StatsQuest (2) StatsQuest