Finding Mean Squared Error?

Question

I have produced a linear data set and have used lm() to fit a model to that dataset. I am now trying to find the MSE using mse()

I know the formula for MSE but I'm trying to use this function. What would be the proper way to do so? I have looked at the documentation, but I'm either dumb or it's just worded for people who actually know what they're doing.

library(hydroGOF)

x.linear <- seq(0, 200, by=1) # x data
error.linear <- rnorm(n=length(x.linear), mean=0, sd=1) # Error (0, 1)
y.linear <- x.linear + error.linear  # y data

training.data <- data.frame(x.linear, y.linear)
training.model <- lm(training.data)
training.mse <- mse(training.model, training.data)

plot(training.data)

mse() needs two data frames. I'm not sure how to get a data frame out of lm(). Am I even on the right track to finding a proper MSE for my data?

@ZheyuanLi I'm more-or-less asking where my predicted/simulated set of Y values can come from for the formula. In the `mse()` function, it requires an observed and simulated data frame. I need to know what to use for both those data frames. — Dan, Sep 27 '16 at 18:43
I don't know why you'd use this weird function instead of `mean(training.model$residuals ^ 2)` — Gregor Thomas, Sep 27 '16 at 18:43
You can get the fitted values from the model, `training.model$fitted.values`, but they are a vector, not a data frame. So I suppose the alternative is `hydroGOF::mse(data.frame(training.model$fitted.values), training.data[["y.linear"]])`... also I'd **strongly** recommend specifying a formula when fitting a model. As you have it I think you're regressing `x` on `y`, which is probably not what you want. — Gregor Thomas, Sep 27 '16 at 18:47
@ZheyuanLi I think you guys are right, I'll just do it the old fashioned way — Dan, Sep 27 '16 at 18:53

score 16 · Accepted Answer · answered Sep 27 '16 at 18:47

16

Try this:

mean((training.data - predict(training.model))^2)
#[1] 0.4467098

answered Sep 27 '16 at 18:47

Sandipan Dey

21,482
2
51
63

I was advised to use the `mse()` function but this is a way I'm more comfortable with. Thank you! – Dan Sep 27 '16 at 19:02
1

Special care needs to be taken when calculating MSE for multiple linear regression. The denominator to calculate MSE is `n - (p+1)`, where p is the number of predictors. Here, in case of simple linear regression, p = 1 so the denominator becomes n. – Quazi Irfan Apr 07 '20 at 09:11

score 8 · Answer 2 · answered Aug 15 '17 at 09:44

You can also use below mentioned code which is very clean to get mean square error

install.packages("Metrics")
library(Metrics)
mse(actual, predicted)

The first data set on which is actual one : training.data The second argument is the one which you will predict like :

pd <- predict(training.model , training.data) mse(training.data$,pd)

Seems you have not done prediction yet so first predict the data based on your model and then calculate mse

score 3 · Answer 3 · answered Oct 11 '17 at 14:08

3

You can use the residual component from lm model output to find mse in this manner :

mse = mean(training.model$residuals^2)

answered Oct 11 '17 at 14:08

Namrata Tolani

823
9
12

score 2 · Answer 4 · edited Jan 30 '19 at 04:55

2

Note: if you come from another program (like SAS) they get the mean using the sum and the degrees of freedom of the residual. I recommend doing the same if you want a more accurate estimate of the error.

mse = sum(training.model$residuals^2)/training.model$df.residual

I found this while trying to figure out why mean(my_model$residuals^2) was different in R than the MSE in SAS.

edited Jan 30 '19 at 04:55

Dmytro Dadyka

2,208
5
18
31

answered Jan 29 '19 at 18:59

Carlos Mercado

165
1
5

Finding Mean Squared Error?

4 Answers4