1

I am running a linear regression with many levels of fixed effects and which takes a long time to run.

Due to efficiency reasons, I would like to:

  1. demean the variables once (for this I am using demeanlist() from the lfe package.
  2. save the demean matrix
  3. run lm.fit() on the demeaned matrix instead of lm() for efficiency reasons (the dataset has more than 50 Million rows)
  4. save the output from lm.fit()
  5. apply to the output the SE correction to account for
    1. clustering/heteroskedasticity (ideally here I would like to try different things without having to rerun the model every time)
    2. the true number of DoF which is lower than the default one in lm.fit() since lm.fit() does not take into account the demeaning part.
  6. output to Latex with stargazer

I have tried 1-4 successfully and now I am wondering how to tackle 5. Ideally also 6 but of course it is minor.

I am open of course to alternatives to 4. I do not have to strictly run lm.fit(), I am ok with anything from fastLm(), felm()

EDIT: Minimal self-contained example

library(fastDummies)
library(felm)    
library(lfe)    

data <- data.frame(author=c("a","a","a","a","b","b","b","b","c","c","c","c"),
                   date=c(1,1,2,2,1,1,2,2,1,1,2,2), sub=c("political", "general", "political", "general","political", "general", "political", "general","political", "general", "political", "general"), treatment1=c(1,0,0,0,0,1,0,0,0,1,1,1), outcome=c(0,2,5,5,7,0,1,1,23,3,10,11), treatment2= c(1,1,0,0,1,0,0,1,0,0,0,0))


yX <- data[,c("treatment1", "treatment2", "outcome")]


cx <- demeanlist(yX, list(as.factor(data$author), as.factor(data$sub), as.factor(data$date)))


x <- lm.fit(as.matrix(cx[,1:2]), as.matrix(cx[3]))

I want now to have a summary of x which I can output to Latex but in which I may correct the DoFs and I can cluster SE or use heteroskedasticity-robust SEs.

jay.sf
  • 60,139
  • 8
  • 53
  • 110
leo_damico
  • 13
  • 4
  • 1
    You might be interested in `lmtest::coeftest` which calculates clustered standard errors from model fits. Would you mind to design a [self-contained minimal example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example/5963610#5963610) to show us what you're doing? – jay.sf Jul 10 '19 at 09:24
  • 1
    Why the Rcpp tag? – Ralf Stubner Jul 10 '19 at 09:25
  • Agree with @Ralf and suggest you remove the `rcpp` tag. – Dirk Eddelbuettel Jul 10 '19 at 09:36
  • @RalfStubner you are right sorry. I thought to do ask the question more specifically targeted to fastLm() from RcppEigen package and to ask specifically also if there are some Rcpp commands to compute fastly SE corrections (which I assume would gain a lot from Rcpp). In the end I went for a more general question and forgot to remove the tag! Sorry and thanks for pointing that out! – leo_damico Jul 10 '19 at 10:13
  • @jay.sf Hi jay, example added! Thanks a lot! – leo_damico Jul 10 '19 at 14:01

0 Answers0