0

I am doing forecasting of electrical power output, I have different sets of data that varies from 200-4000 observations. I have calculated forecasting but I do not know how to calculate RMSE value and R (correlation coefficient) in R. I tried to calculate it on excel and the result for rmse was 0.0078. so I have basically two questions here.

  1. How to calculate RMSE and R value in R?
  2. What is good RMSE value? is 0.007 a good considerable value?
  • 1
    If you have a model, try `sqrt(sum(resid(model)^2))`. And a value is not good on its own, it's good when compared to others obtained from other fitted models. – Rui Barradas Apr 24 '21 at 19:29
  • @RuiBarradas, post as answer? – Ben Bolker Apr 24 '21 at 19:36
  • For part 1, do any of these answer your question? https://stackoverflow.com/a/26237921/6851825 and https://stackoverflow.com/a/35916901/6851825 and https://stackoverflow.com/a/43123619/6851825 – Jon Spring Apr 24 '21 at 20:43

2 Answers2

1

Here are two functions, one to compute the MSE and the second calls the first one and takes the squre root, RMSE.

These functions accept a fitted model, not a data set. For instance the output of lm, glm, and many others.

mse <- function(x, na.rm = TRUE, ...){
  e <- resid(x)
  mean(e^2, na.rm = TRUE)
}
rmse <- function(x, ...) sqrt(mse(x, ...))

Like I said in a comment to the question, a value is not good on its own, it's good when compared to others obtained from other fitted models.

Rui Barradas
  • 70,273
  • 8
  • 34
  • 66
  • Thankyou. I did predictions and calculated the rmse value. the rmse value is 0.60 while in plot the predictions have not followed the observed data properly, It has some deviations from it. Does that mean my model is not fit or something else? – Sohrab Khan Apr 25 '21 at 15:58
  • @SohrabKhan That's impossible for us to tell. If the data is more spread around a fitted line, the RMSE will be bigger though the model is good. Or maybe other models are a better choice. Like I said, the RMSE value in itself is not a criterion, have you tried to plot the residuals? Q-Q plots, histograms, etc? – Rui Barradas Apr 25 '21 at 16:39
0

Root Mean Square Error (RMSE) is the standard deviation of the prediction errors. prediction errors are a measure of how far from the regression line data points are; RMSE is a measure of how spread out these residuals are. In other words, it tells you how concentrated the data is around the line of best fit. Root mean square error is commonly used in climatology, forecasting, and regression analysis to verify experimental results.

The formula is: RMSE Where:

f = forecasts (expected values or unknown results),
o = observed values (known results).

The bar above the squared differences is the mean (similar to x̄). The same formula can be written with the following, slightly different, notation: enter image description here

Where:

Σ = summation (“add up”)
(zfi – Zoi)2 = differences, squared
N = sample size.

You can use which ever method you want as both reflects the same and "R" that you are refering to is pearson coefficient that defines the variance amount in the data

Coming to Question2 a good rmse value is always depends on the upper and lower bound of your rmse and a good value should always be smaller that gives less probe of error

venkatesh
  • 162
  • 2
  • 6
  • 1
    (1) The theory looks fine, but it's not what the OP asked about (and if they had it would be off-topic for SO); (2) you didn't answer their primary question (Q1), "how do I calculate RMSE in R" ? Q2 is also off-topic for SO ... (And, while I'm picking on you, you should either write the formulas out in text e.g. `sqrt(bar((f-o)^2))` or at least give alt-text alternatives ... – Ben Bolker Apr 24 '21 at 19:58