0

Say I want to estimate with lm() the means of y over k groups, where groups are defined by a factor.

If I just run lm(y ~ factor), this will give me an intercept, and the coefficient for the k-1 factors, but expressed as difference from the intercept. I want instead to have direct values of the means.

Is there a way to do this cleanly with contrast in lm()? I am not sure how this contrast would be called... orthogonal? I can obviously remove the intercept: lm(y ~ -1+ factor) but this would give me wrong R2 values

reg1 <- lm(Sepal.Length~ Species, data=  iris)
reg2 <- lm(Sepal.Length~ -1 + Species, data=  iris)

## get coefs
coef(reg1) # not what I want
#>       (Intercept) Speciesversicolor  Speciesvirginica 
#>             5.006             0.930             1.582
coef(reg2) # whay I want
#>     Speciessetosa Speciesversicolor  Speciesvirginica 
#>             5.006             5.936             6.588

## THe models are equivalent:
all.equal(fitted(reg1), fitted(reg2))
#> [1] TRUE


# but the -1 trick will create problems for some stats, such as R2
summary(reg1)$r.squared
#> [1] 0.6187057
summary(reg2)$r.squared
#> [1] 0.9925426

Created on 2019-05-01 by the reprex package (v0.2.1)

Zheyuan Li
  • 71,365
  • 17
  • 180
  • 248
Matifou
  • 7,968
  • 3
  • 47
  • 52
  • 1
    What do you mean by "wrong r2 values"? You can't have it both ways. This seems like maybe more of a statistics question than a programming question. If you need help understanding how linear regression models work, then you you should ask instead at [stats.se] where statistics questions are on topic. This might already explain it: https://stats.stackexchange.com/questions/26176/removal-of-statistically-significant-intercept-term-increases-r2-in-linear-mo – MrFlick May 01 '19 at 17:22
  • 1
    Also a discussion of the same issue here: https://stats.stackexchange.com/questions/171240/how-can-r2-have-two-different-values-for-the-same-regression-without-an-inte/171250#171250 – MrFlick May 01 '19 at 17:23
  • The point about R2 is secondary, my main question is how to get direct coefficient values with a factor. And for the secondary point, note that the two regressions are exactly the same (just a change in the way coefficients are labelled) and hence give the same SSR decomposition, SSR_tot =SSRpred +SSR_res. So one would expect to give the same R2. – Matifou May 01 '19 at 18:46
  • Looking back at this question, I feel the comment by @MrFlick was not very accurate, if not slightly patronizing. Nothing that a "saturated" design corresponds to including an intercept, are you suggesting that two models all else identical (in terms of same projection, same fitted values, SSR, etc) except for the leveling of the coefficients, should have a different R2? – Matifou Oct 17 '22 at 08:50

1 Answers1

3

It is not “orthogonal contrast” but “no contrast at all”.

Regarding the incorrect R squared: summary.lm computes this quantity in a different way whether there is explicitly an intercept in the model or not. You may want to manually compute R squared in this case: cor(Sepal.Length, fitted(reg2))^2. See this comment.

NelsonGon
  • 13,015
  • 7
  • 27
  • 57
Zheyuan Li
  • 71,365
  • 17
  • 180
  • 248
  • thanks! So is there a way in R to specify "no contrast at all" without removing manually the intercept? That would avoid having to do the manual R2 correction. – Matifou May 01 '19 at 19:00
  • @Matifou Contrast can be disabled, see [this Q&A](https://stackoverflow.com/q/41032858/4891738). However it will not achieve the result you hope. The Q&A provides you rich information how factor covariate variables in regression. – Zheyuan Li May 01 '19 at 19:10