RMSE - The oft misinterpreted metric

RMSE - The oft misinterpreted metric

I have now encountered the following error in two seperate books (edit: and now a wikipedia article). One of the books is literally about statistics. The other one is more about data mining.

RMSE IS NOT "How much error we expect to see on average".

Unfortunately, people often make this mistake. Let's look at the formula for RMSE.

$$RMSE = \sqrt{\frac{1}{n}\sum_{i=1}^{n} (y_i - h(x_i))^2} $$

"How much error we expect to see on average" is words for expected error. What is "error" you might ask. I think, and I think most other people do too, that it is the absolute deviation of a prediction from it's true value.

We could estimate that by taking the average over all data points of the error (it would be good to do this on a test set). For a data set, the formula would be:

$$\mathbb{E}[error] = \frac{1}{n}\sum_{i=1}^{n}|y_i-h(x_i)|$$

This is also known as the Mean Absolute Error (MAE). This is what may be interpreted as "how much error we expect to see on average". RMSE does not have an easy interpretation - something like the standard deviation around the true value. Often, when making business decisions or thinking how any prediction could be used in real life, it is very useful to know "how much a prediction will be off on average". RMSE is not this!

Now I'll guess what you're thinking - "ok, ok Adam, but aren't these numbers just basically the same in practice?". The answer is no! RMSE is always higher than MAE (except for the weird case where all the errors are the same in which case RMSE equals MAE). Let me show you an example with R.

First we'll generate y, the true value. I'll sample some integers to do this.

In [1]:
y <- sample(c(3,4,5,9),size=100,replace=TRUE,prob=c(.2,.3,.2,.3))
In [2]:
  1. 9
  2. 3
  3. 9
  4. 9
  5. 4

Now I'll make x, the estimate of the true value. I'll just add noise generated from a standard normal random variable to y.

In [3]:
x <- y + rnorm(100)

Now let's see what the RMSE is for this estimate and the MAE is.

In [4]:
paste("RMSE is",mean((x-y)^2)^.5)
'RMSE is 0.934173542011242'
In [5]:
paste("MAE is",mean(abs(x-y)))
'MAE is 0.770069705389762'

So there is a large difference numerically. This difference is even larger when we have any large errors. For example, the true first value is 9, if we change it's estimate to 20, we will see an even larger discrepency between MAE and RMSE:

In [6]:
x[1] <- 20
In [7]:
paste("RMSE is",mean((x-y)^2)^.5)
'RMSE is 1.4427000400426'
In [8]:
paste("MAE is",mean(abs(x-y)))
'MAE is 0.876468592985346'

Now, let me show you the writings that I've found that say that RMSE is "how much error we expect to see on average":

  • "The RMSE was used to measure the performance of algorithms. CineMatch has an RMSE of approximately 0.95; i.e., the typical rating would be off by almost one full star." - Mining Massive Data Sets pg. 337

  • "Standard error (SE) is a measure of how far we expect the estimate to be off, on average. For each simulated experiment, we compute the error, x − µ, and then compute the root mean squared error (RMSE)." - Think Stats 2 link

  • "When two data sets—one set from theoretical prediction and the other from actual measurement of some physical variable, for instance—are compared, the RMS of the pairwise differences of the two data sets can serve as a measure how far on average the error is from 0." - Wikipedia link

All of these authors made the same mistake. RMSE is not an estimate of average error. It measures something different. It can be very sensitive to large errors and it should NOT be interpreted as how much an estimate is off on average.

Written on November 13, 2016