4

I understand that lm treats weights as "analytic" weights, meaning that observations are just weighted against each other (e.g. lm will weigh an observation with weight= 2 twice as much as one with weight = 1), and the overall N for the model is unaffected. "Frequency" weights, on the other hand, would allow the model to have a different N than the actual number of observations in the data.

People have asked about frequency weights in R before, but as far as I can tell prior questions have been concerned with survey data. I am not using survey data for this question.

I'd like to implement frequency weights that are less than 1, and which cause the model's N to be smaller than the actual number of rows in the data. For example, if nrow(df) = 8 and all observations have weight= 0.5, the model N should be 4, and the standard errors should reflect this difference. The weights for base R's lm can't be used this way, as far as I can tell:

library(tidyverse)
library(broom)

df.unweighted <- tribble(
  ~x, ~y, ~w,
  0, 10, 1,
  0, 20, 1,
  1, 40, 1,
  1, 50, 1,
) %>%
  bind_rows(., .) # make twice as large

df.weighted <- df.unweighted %>%
  mutate(w = 0.5)

lm(data=df.unweighted, y~x, weights=w) %>%
  tidy
#> # A tibble: 2 x 5
#>   term        estimate std.error statistic  p.value
#>   <chr>          <dbl>     <dbl>     <dbl>    <dbl>
#> 1 (Intercept)      15.      2.89      5.20 0.00202 
#> 2 x                30       4.08      7.35 0.000325

lm(data=df.weighted, y~x, weights=w) %>%
  tidy
#> # A tibble: 2 x 5
#>   term        estimate std.error statistic  p.value
#>   <chr>          <dbl>     <dbl>     <dbl>    <dbl>
#> 1 (Intercept)     15.       2.89      5.20 0.00202 
#> 2 x               30.0      4.08      7.35 0.000325

# identical

What I'm looking for can be achieved in stata using iweights. Note the model N and standard errors:

library(RStata)
stata("reg y x [iweight=w]",
      data.in = df.weighted)
#> . reg y x [iweight=w]
#> 
#>       Source |       SS       df       MS              Number of obs =       4
#> -------------+------------------------------           F(  1,     2) =   18.00
#>        Model |         900     1         900           Prob > F      =  0.0513
#>     Residual |         100     2          50           R-squared     =  0.9000
#> -------------+------------------------------           Adj R-squared =  0.8500
#>        Total |        1000     3  333.333333           Root MSE      =  7.0711
#> 
#> ------------------------------------------------------------------------------
#>            y |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
#> -------------+----------------------------------------------------------------
#>            x |         30   7.071068     4.24   0.051    -.4243492    60.42435
#>        _cons |         15          5     3.00   0.095    -6.513264    36.51326
#> ------------------------------------------------------------------------------

In my actual usage, not all observations will have the same weight. I just did that here for ease of demonstration.

lost
  • 1,483
  • 1
  • 11
  • 19

0 Answers0