8

I have produced a linear data set and have used lm() to fit a model to that dataset. I am now trying to find the MSE using mse()

I know the formula for MSE but I'm trying to use this function. What would be the proper way to do so? I have looked at the documentation, but I'm either dumb or it's just worded for people who actually know what they're doing.

library(hydroGOF)

x.linear <- seq(0, 200, by=1) # x data
error.linear <- rnorm(n=length(x.linear), mean=0, sd=1) # Error (0, 1)
y.linear <- x.linear + error.linear  # y data

training.data <- data.frame(x.linear, y.linear)
training.model <- lm(training.data)
training.mse <- mse(training.model, training.data)

plot(training.data)

mse() needs two data frames. I'm not sure how to get a data frame out of lm(). Am I even on the right track to finding a proper MSE for my data?

Jaap
  • 81,064
  • 34
  • 182
  • 193
Dan
  • 1,163
  • 3
  • 14
  • 28
  • @ZheyuanLi I'm more-or-less asking where my predicted/simulated set of Y values can come from for the formula. In the `mse()` function, it requires an observed and simulated data frame. I need to know what to use for both those data frames. – Dan Sep 27 '16 at 18:43
  • I don't know why you'd use this weird function instead of `mean(training.model$residuals ^ 2)` – Gregor Thomas Sep 27 '16 at 18:43
  • You can get the fitted values from the model, `training.model$fitted.values`, but they are a vector, not a data frame. So I suppose the alternative is `hydroGOF::mse(data.frame(training.model$fitted.values), training.data[["y.linear"]])`... also I'd **strongly** recommend specifying a formula when fitting a model. As you have it I think you're regressing `x` on `y`, which is probably not what you want. – Gregor Thomas Sep 27 '16 at 18:47
  • @ZheyuanLi I think you guys are right, I'll just do it the old fashioned way – Dan Sep 27 '16 at 18:53

4 Answers4

16

Try this:

mean((training.data - predict(training.model))^2)
#[1] 0.4467098
Sandipan Dey
  • 21,482
  • 2
  • 51
  • 63
  • I was advised to use the `mse()` function but this is a way I'm more comfortable with. Thank you! – Dan Sep 27 '16 at 19:02
  • 1
    Special care needs to be taken when calculating MSE for multiple linear regression. The denominator to calculate MSE is `n - (p+1)`, where p is the number of predictors. Here, in case of simple linear regression, p = 1 so the denominator becomes n. – Quazi Irfan Apr 07 '20 at 09:11
8

You can also use below mentioned code which is very clean to get mean square error

install.packages("Metrics")
library(Metrics)
mse(actual, predicted)

The first data set on which is actual one : training.data The second argument is the one which you will predict like :

pd <- predict(training.model , training.data) mse(training.data$,pd)

Seems you have not done prediction yet so first predict the data based on your model and then calculate mse

Vineet
  • 1,492
  • 4
  • 17
  • 31
3

You can use the residual component from lm model output to find mse in this manner :

mse = mean(training.model$residuals^2)
Namrata Tolani
  • 823
  • 9
  • 12
2

Note: if you come from another program (like SAS) they get the mean using the sum and the degrees of freedom of the residual. I recommend doing the same if you want a more accurate estimate of the error.

mse = sum(training.model$residuals^2)/training.model$df.residual

I found this while trying to figure out why mean(my_model$residuals^2) was different in R than the MSE in SAS.

Dmytro Dadyka
  • 2,208
  • 5
  • 18
  • 31
Carlos Mercado
  • 165
  • 1
  • 5