Post hoc tests

Comparisons after non-significant ANOVA?

It’s perfectly possible the p-values of the ANOVA are not significant while the p-values of some of the post hoc tests are.

The ANOVA tests the main effects: is there a difference between the groups while post hoc tests like Tukey or Dunnett will look at each pair of groups individually. That’s two different things so it is possible one is significant while the other is not.

When to do multiple testing correction?

Each time you do multiple statistical tests on the same samples.

Is multiple testing correction required?

Yes, I will explain with another example.

Suppose you think you found a treatment for COVID19. You take two sets of patients, one set gets a placebo treatment, the other set gets the treatment you developed. After a few days you measure viral loads in both sets of patients and compare them. Up to this point everything is fine.

Suppose you not only measure viral loads but also body temperature, pain, cough frequency, cough intensity, oxygen saturation…. The more symptoms you compare the more likely you are to find a difference between the two sets of patients just by chance. For instance, if the treated set of patients contains a lot of tough guys and girls they might report less pain than the placebo group but that difference is unrelated to the treatment.

To tackle this issue you need to do multiple testing correction.

For which comparisons do you correct?

You need to correct for the comparisons that you actually do. The more comparisons you do the more stringent the correction will be. That’s why it’s important to think about relevant comparisons and to make a clear distinction between controls and groups that you want to compare with.

Negative (blank measurements) and positive controls are just what they are: controls. You use them to check the quality of the measurements. They determine if you need to repeat the experiment. They are not groups that you want to use for comparisons.

Which post-hoc test after Kruskall Wallis?

You use the Dunn method for non-parametric pairwise comparisons (see Prism tutorial). It performs a Bonferroni correction. The Mann Whitney test or the Wilcoxon test are not ideal as post hoc tests. If you perform an ordinary Wilcoxon or Mann-Whitney test, two problems arise:

  • the ranks used in these pairwise tests are not the ranks used by the Kruskal-Wallis test
  • these tests do not use the pooled variance implied by the Kruskal-Wallis null hypothesis

Dunn’s test does not have these problems… more info here.

Further reading