5

My overarching question is, how does R calculate R^2 in the WLS case? It doesn't just weight the observations and then calculate the R^2. To try and figure this out, I was going through the source code until I ran into this in the lm.wfit code:

z <- .Call(C_Cdqrls, x *wts, y*wts, tol) 

What is being done here? Does anyone know how I can access the code for this to get to the details? I.e., what is being returned to z? How are C_Cdqrls, x*wts, y*wts, tol being used?

What I understand so far (and I'm not sure if it's right), is that .Call means that R is executing this code in C. However, I'd like to see how this is done in C if possible.

Thanks!

flodel
  • 87,577
  • 21
  • 185
  • 223
user722224
  • 1,501
  • 2
  • 14
  • 9

2 Answers2

5

The R squared value is actually calculated when calling summary.lm, you can look at the source code for any function, either in the actual svn repository (https://svn.r-project.org/R/), or this read only mirror on github.

Looking in https://github.com/wch/r-source/blob/trunk/src/library/stats/R/lm.R for summary.lm

we see the following accounting for the weights (w )

r <- z$residuals
f <- z$fitted.values
w <- z$weights
if (is.null(w)) {
    mss <- if (attr(z$terms, "intercept"))
        sum((f - mean(f))^2) else sum(f^2)
    rss <- sum(r^2)
} else {
    mss <- if (attr(z$terms, "intercept")) {
        m <- sum(w * f /sum(w))
        sum(w * (f - m)^2)
    } else sum(w * f^2)
    rss <- sum(w * r^2)
    r <- sqrt(w) * r
}
# ..... some other code
# ... then this definition
ans$r.squared <- mss/(mss + rss)
mnel
  • 113,303
  • 27
  • 265
  • 254
  • thank you very much for taking the time to respond to this. After reading my question over again, I realized my question was unclear. I discovered the code above before and am actually trying to replicate that manually. Obviously, all the variables are defined there (r,f,w,rss,mss), however r and f are defined in terms of z (z$residuals and z$fitted.values). A look at the source code shows that z is calculated as follows: z <- .Call(C_Cdqrls, x *wts, y*wts, tol). This is the step I am confused about - I don't understand how z is being calculated. Thanks again for looking! – user722224 Apr 03 '13 at 15:35
  • I realize I could also just run lm on the weighted values and obtain z$residuals and z$fitted.values, and then see if I can validate the results. However, I'd really like to know what is going on behind the scenes here if possible. Thank you. – user722224 Apr 03 '13 at 15:43
4

A google search quickly produced this:

https://svn.r-project.org/R/trunk/src/library/stats/src/lm.c

IRTFM
  • 258,963
  • 21
  • 364
  • 487
  • 1
    Thanks DWin. What terms did you google exactly? I think my unfamiliarity with this coding language (is this C?) deterred me from searching properly. Is it correct that I am looking at regression calculations in C? ie, are the residuals and fitted values calculated here? Thank you for your reply. – user722224 Apr 03 '13 at 15:38
  • 1
    If I remember correctly the strategy was: "trunk Cdqrls". "trunk" was chosen because I knew that would be in the URL of the current version of R's source. The second term was chosen slightly accidentally since I just dbl-clicked-copy on the module name and it didn't pull in the "C_". – IRTFM Apr 03 '13 at 15:42
  • that's a neat trick - thanks for your insight! – user722224 Apr 03 '13 at 16:01