Are the data drawn from a normal distribution?

What is checked by a normality test?

Each normality test assumes the data is normal and checks how likely that assumption is. A high p-value means there’s no evidence this assumption is false: the data is normal. A low p-value means that it’s very likely that the assumption is false: the data is not normal. This is one of the few occasions in statistics where you actually hope for a high p-value 🙂

Shapiro Wilk or D’Agostino Pearson?

It’s a personal preference. In R people use Shapiro Wilk since the D’Agostino Pearson omnibus test is not implemented in R. In Prism most people use D’Agostino Pearson since it’s recommended by Graphpad. If I can I do both.

Both tests check normality but each in a different way. Shapiro Wilk compares the distribution of the data to a standard normal distribution.

D’Agostino Pearson compares the skewness and the tails of the data to those of a standard normal distribution.

Both tests use very different approaches to check normality so it makes sense to use both of them. In most cases they will agree.

What if the normality tests do not agree?

You have to look at the data (e.g. by making a histogram) and find out why. D’Agostino Pearson will be affected more by outliers than Shapiro Wilk. Outliers have a big impact on the skew but not on the slope of the central half of the data in the QQ plot.

What if you have < 7 replicates per group?

If you have few replicates you either check normality of the residuals or:

  • Assume the data are normally distributed and do a parametric test. If the data are not really normal the test can generate false positives.
  • Assume the data are not normal and do a non-parametric test. If the data in reality are normal the test can generate false negatives.

Note on very small data sets

If you only have 3 measurements per group, non parametric tests will be too stringent so you’ll typically use a parametric test. However, you have to realize that the outcome might be a false positive.

Residual analysis

Residuals are calculated for each group separately and then combined. They are calculated by calculating the mean of each group, and subtracting that mean from every data value in the group.

QQ plots/histograms for few replicates?

No they are not useful when you have few replicates.

How to check normality of large data sets?

Normality tests are not reliable for large data sets. They are too stringent: they will say the data are not normal while they are.

Histograms are reliable. If you don’t see a bell curve, the data are not normal. For data sets with > 30 values you can assume normality even if the histogram looks a bit skewed. If the histogram looks very skewed, a log transformation will often bring solace. Only in extreme cases e.g. you see two bell curves instead of one, you may not assume normality. In that case, the data represents 2 populations instead of 1 and you will have to find the factor that determines these 2 populations e.g. there might be a difference between males and females or between young and old individuals…