R-squared in lm() for zero-intercept model

Question

I run an lm() in R and this is the results of the summary:

Multiple R-squared:  0.8918,    Adjusted R-squared:  0.8917 
F-statistic:  9416 on 9 and 10283 DF,  p-value: < 2.2e-16

and it seems that it is a good model, but if I calculate the R^2 manually I obtain this:

model=lm(S~0+C+HA+L1+L2,data=train)
pred=predict(model,train)
rss <- sum((model$fitted.values - train$S) ^ 2)
tss <- sum((train$S - mean(train$S)) ^ 2)
1 - rss/tss
##[1] 0.247238
rSquared(train$S,(train$S-model$fitted.values))
##          [,1]
## [1,] 0.247238

What's wrong?

str(train[,c('S','Campionato','HA','L1','L2')])
Classes ‘tbl_df’, ‘tbl’ and 'data.frame':   10292 obs. of  5 variables:
 $ S         : num  19 18 9 12 12 8 21 24 9 8 ...
 $ C         : Factor w/ 6 levels "D","E","F","I",..: 4 4 4 4 4 4 4 4 4 4 ...
 $ HA        : Factor w/ 2 levels "A","H": 1 2 1 1 2 1 2 2 1 2 ...
 $ L1        : num  0.99 1.41 1.46 1.43 1.12 1.08 1.4 1.45 0.85 1.44 ...
 $ L2        : num  1.31 0.63 1.16 1.15 1.29 1.31 0.7 0.65 1.35 0.59 ...

welcome to SO! Provide an example of your input data in `train`, moreover, where did you took the "manual R2 calculation formula" from? — cccnrc, Aug 08 '19 at 15:18
why are you running the model without intercept? See @jludewig answer — cccnrc, Aug 08 '19 at 15:23
I used folowed the third answer https://stackoverflow.com/questions/40901445/function-to-calculate-r2-r-squared-in-r — Riccardo Tornaghi, Aug 08 '19 at 15:24
Because with the intercept I obtain an R2 of 0.24752 in the summary(lm()) — Riccardo Tornaghi, Aug 08 '19 at 15:25
0.24752 is the value of R2 which you calculated out manually. — Dave2e, Aug 08 '19 at 15:28
See also [this Cross Validated post](https://stats.stackexchange.com/questions/233256/why-i-am-getting-different-r2-from-r-lm-and-manual-calculation). — Rui Barradas, Aug 08 '19 at 15:45

jludewig · Answer 1 · 2019-08-08T15:52:29.287

3

You are running a model without the intercept (the ~0 on the right hand side of your formula). For these kinds of models the calculation of R^2 is problematic and will produce misleading values. This post explains it very well: https://stats.stackexchange.com/a/26205/99681

edited Aug 08 '19 at 15:52

answered Aug 08 '19 at 15:21

jludewig

428
2
8

1

also https://stats.idre.ucla.edu/other/mult-pkg/faq/general/faq-why-are-r2-and-f-so-large-for-models-without-a-constant/ and https://cran.r-project.org/doc/FAQ/R-FAQ.html#Why-does-summary_0028_0029-report-strange-results-for-the-R_005e2-estimate-when-I-fit-a-linear-model-with-no-intercept_003f . I wouldn't necessarily blame this on R: R^2 in zero-intercept models is **necessarily** problematic (i.e., there are differently bad solutions, but no good solutions) – Ben Bolker Aug 08 '19 at 15:48
@BenBolker good point. I edited my answer to better reflect that – jludewig Aug 08 '19 at 15:53
The point is that R squared is based on comparing a model to a minimal submodel. In the case that the model has an intercept the logical submodel to compare it to is the model that contains only the intercept, i.e. y ~ 1 in R's model notation; however, if the model has no intercept then that is not a submodel any more and the logical submodel to use is y ~ 0. That is why different formulas are needed for R squared. – G. Grothendieck Aug 08 '19 at 16:02

R-squared in lm() for zero-intercept model

1 Answers1

Linked

Related