How to analyze ratios?

Don’t do statistics on ratios

Many scientists calculate ratios. They start with 2 groups of data: T (treated) and C (control).
As a normalization they divide all measurements by the control: T/C C/C
In this way you change all measurements into relative values: how much times is T higher or lower than C?
Then they typically ask me: can we compare the 2 groups?

The answer is no: after dividing by the control, the 2 groups are no longer independent.
Moreover, all control measurements will be equal to 1 with a spread of 0.

The next question they ask is: can we compare T/C to 1? If ratio=1 then T=C.
The answer is no: working with ratios is not a good idea.

Ratios do not follow a symmetrical distribution. Therefore they simply cannot be normally distributed.

As such you cannot calculate a mean, SD, sem, CI for ratios, nor can you do parametric tests on ratios.
Consequently, on graphs ratios cannot be represented by a mean, they cannot have error bars.

I will give an example why it’s wrong to calculate a mean for ratios.
T1/C1 = 4     =>    T is 4x higher than C
T2/C2 = 0,1  =>   T is 10x lower than C
If I calculate the mean = (4-0.1)/2 = 2,05    =>    T is 2x higher than C
The center of 4x higher and 10x lower cannot be 2x higher.

You can use non-parametric tests on ratios since they do not work on the data but on the ranks.
If you absolutely want to show ratios on the graph you can represent the center by the geometric mean.

The most elegant solution is to use log ratios instead.
They are drawn from a symmetrical distribution and can be normally distributed (you still have to check).

I come back on my example from above:
T1/C1 = 4         =>    T is 4x higher than C => log2(T1/C1) = 2
T2/C2 = 0,1       =>    T is 10x lower than C => log2(T2/C2) = -3.32
If a calculate the mean log ratio = (2-3.32)/2 = -0,66  =>    T is 1,5x lower than C
This does make sense: the center of 4x up and 10x down could be 1.5x down.

For log ratios you can calculate mean, SD, sem, CI and you can do parametric tests.
Instead of checking if  T/C = 1  you check if log(T/C) = 0 (mathematically this is the same).
On graphs you can represent log ratios by a mean, you can include error bars.

When you convert ratios to percentages they’re still ratios

Sometimes scientists multiply the ratios by 100 and say: “Hey, I don’t have ratios, I have percentages now and I can do regular statistics.” But every time you calculate ratios or ratios multiplied by 100, you essentially work with fold changes e.g. drug1 is x times higher than DMSO. You have to do a log transformation.

Coming back on my previous example:
T1/C1 = 4   or when converted to a percentage 400%  =>   T is 4x higher than C
T2/C2 = 0,1  or when converted to a percentage 10% =>   T is 10x lower than C
If I calculate the mean of the percentages = (400+10)/2 = 205    =>    T is 2x higher than C
The center of 4x higher and 10x lower cannot be 2x higher.

Same issue as before: you will systematically underestimate lower and overestimate higher. For lower it will be more difficult to show significance if you do statistics on the percentages. You can solve the issue by working with log ratios.

It’s not to say that you can never do statistics on percentages but when you actually want to look at fold changes you should work with log ratios and not with percentages.