Manual calculation of WLS does not match output of lm() in R

Question

Weighted Least Squares can be computed in R using the command lm() with the weights option. To understand the formula, I computed them manually too but results are not the same:

# Sample from model
set.seed(123456789) 
x1 <- 10 * rbeta(500, 2, 6) 
x2 <- x1 + 2*rchisq(500, 3, 2) + rnorm(500)*sd(x1)
u <- rnorm(500) 
y <- 0.5 + 1.2*x1 - 0.7*x2 + u

# Produce weights (shouldn't matter how)
w <- x1/sd(x1)
# Manual WLS
y_WLS <- y*sqrt(w)
x1_WLS <- x1*sqrt(w)
x2_WLS <- x2*sqrt(w)
summary(lm(y_WLS ~ x1_WLS + x2_WLS))
# Automatic WLS
summary(lm(y ~ x1+x2, weights=w))

The two commands give a different result. It should be exactly the same. I am just following the instructions from Wooldridge (2019), section 8.4a, which relevant bits I have combined in the image below:

As you can read, WLS with weight w is equivalent to running OLS in a transformed model where each variable has been multiplied by the squared root of w, which is what I do above. Why the difference then?

this may help: https://stackoverflow.com/questions/39315867/r-lm-result-differs-when-using-weights-argument-and-when-using-manually-rew — user63230, Nov 26 '20 at 21:04

score 1 · Accepted Answer · answered Nov 26 '20 at 21:20

I realised the problem (thanks to this post). The transformation actually removes the constant and adds a new variable, which is the squared root of the weight. Thus, if instead of

summary(lm(y_WLS ~ x1_WLS + x2_WLS))

I use

summary(lm(y_WLS ~ 0 + sqrt(w) + x1_WLS + x2_WLS))

(removing the constant and adding sqrt(w) as regressor), the two are exactly the same. This is clearly implied in Wooldridge. I wasn't reading carefully enough.

ssaha · Answer 2 · 2020-11-26T20:49:10.460

To my limited knowledge, weights are not necessarily be multiplied with the values, but rather can be conceptualized as the frequency of each values. In other words, each values should be replicated as many times as the weights. For example, let's round your w to the nearest integer, put them into a data.frame and replicate each that many times as the rounded w.

# create a data frame with an id and rounded weights
data <-data.frame(id=1:length(y),y=y,x1=x1,x2=x2,w=w,weight=round(w))     

# now replicate the rows as the value of weights
data.expanded<-data[rep(row.names(data),data$weight),]

# let's fit the manual WLS model
summary(lm(y ~ x1 + x2,data = data.expanded))

Coefficients:
             Estimate Std. Error  t value Pr(>|t|)    
(Intercept)  0.511824   0.096365    5.311  1.4e-07 ***
x1           1.182342   0.024249   48.758  < 2e-16 ***
x2          -0.693060   0.004643 -149.269  < 2e-16 ***

# also fit the automated WLS
summary(lm(y ~ x1+x2, weights=w))

    Coefficients:
             Estimate Std. Error  t value Pr(>|t|)    
(Intercept)  0.519752   0.125816    4.131 4.24e-05 ***
x1           1.181847   0.031544   37.466  < 2e-16 ***
x2          -0.693566   0.006047 -114.694  < 2e-16 ***

As you can see, we get similar results in both approaches. The slight discrepancy you see because of the rounding of weights. For a large dataset, this discrepancy will be almost negligible.

Note that the id is created only to keep track of replications in the expanded data, no other reason.

Mathematical formula behind WLS is exact, not probabilistic. Results should be identical by default, not depending on sampling characteristics. — luchonacho, Nov 26 '20 at 21:09

Manual calculation of WLS does not match output of lm() in R

2 Answers2