I am running a linear regression with many levels of fixed effects and which takes a long time to run.
Due to efficiency reasons, I would like to:
- demean the variables once (for this I am using
demeanlist()
from thelfe
package. - save the demean matrix
- run
lm.fit()
on the demeaned matrix instead oflm()
for efficiency reasons (the dataset has more than 50 Million rows) - save the output from
lm.fit()
- apply to the output the SE correction to account for
- clustering/heteroskedasticity (ideally here I would like to try different things without having to rerun the model every time)
- the true number of DoF which is lower than the default one in
lm.fit()
sincelm.fit()
does not take into account the demeaning part.
- output to Latex with stargazer
I have tried 1-4 successfully and now I am wondering how to tackle 5. Ideally also 6 but of course it is minor.
I am open of course to alternatives to 4. I do not have to strictly run lm.fit()
, I am ok with anything from fastLm()
, felm()
EDIT: Minimal self-contained example
library(fastDummies)
library(felm)
library(lfe)
data <- data.frame(author=c("a","a","a","a","b","b","b","b","c","c","c","c"),
date=c(1,1,2,2,1,1,2,2,1,1,2,2), sub=c("political", "general", "political", "general","political", "general", "political", "general","political", "general", "political", "general"), treatment1=c(1,0,0,0,0,1,0,0,0,1,1,1), outcome=c(0,2,5,5,7,0,1,1,23,3,10,11), treatment2= c(1,1,0,0,1,0,0,1,0,0,0,0))
yX <- data[,c("treatment1", "treatment2", "outcome")]
cx <- demeanlist(yX, list(as.factor(data$author), as.factor(data$sub), as.factor(data$date)))
x <- lm.fit(as.matrix(cx[,1:2]), as.matrix(cx[3]))
I want now to have a summary of x which I can output to Latex but in which I may correct the DoFs and I can cluster SE or use heteroskedasticity-robust SEs.