I have a very large panel in R, tryed to perform a plm regression and received the error "cannot allocate vector of size 11 Gb". I found out that regression in chunks could be a solution and tryed to use biglm and/or ff packages.
My question is: can I use biglm and/or ff packages to make a random effects estimation?
Using Wooldridge data and the nice example given by Gilles San Martin, i used the following:
#install.packages(c("wooldridge", "plm", "stargazer", "lme4","biglm"), dependencies = TRUE) library(wooldridge)
library(plm)
library(lme4)
library(biglm)
data(wagepan)
formula<-lwage ~ educ + black + hisp + exper+I(exper^2)+ married + union + factor(year)
First, i made the OLS regressions using three different packages:
Pooled.ols <- plm(formula, data = wagepan, index=c("nr","year"), model="pooling")
Pooled.ols.lm <- lm(formula, data = wagepan)
Pooled.OLS.biglm<- bigglm(formula,data=wagepan, chunksize=10, sandwich=TRUE)
Then I made the random effects regressions:
random.effects <- plm(formula, data = wagepan, index = c("nr","year") , model = "random")
random.effects.lme4 <- lmer(lwage ~ educ + black + hisp + exper + I(exper^2) + married +
union + factor(year) + (1|nr), data = wagepan)
I was not able to put the biglm regression inside stargazer table, but i did the following:
stargazer::stargazer(Pooled.ols,Pooled.ols.lm,random.effects,random.effects.lme4, type="text",
column.labels=c("OLS pooled PLM","OLS Pooled LM","Random Effects PLM","Ra ndomEffects Lme4"),
dep.var.labels = c("log(wage)"), keep.stat=c("n"),
keep=c("edu","bla","his","exp","marr","union"), align = TRUE, digits = 4)
summary(Pooled.OLS.biglm)
================================================================================
Dependent variable:
-------------------------------------------------------------------
log(wage)
panel OLS panel linear
linear linear mixed-effects
OLS pooled PLM OLS Pooled LM Random Effects PLM Random Effects Lme4
(1) (2) (3) (4)
--------------------------------------------------------------------------------
educ 0.0913*** 0.0913*** 0.0919*** 0.0919***
(0.0052) (0.0052) (0.0107) (0.0108)
black -0.1392*** -0.1392*** -0.1394*** -0.1394***
(0.0236) (0.0236) (0.0477) (0.0485)
hisp 0.0160 0.0160 0.0217 0.0218
(0.0208) (0.0208) (0.0426) (0.0433)
exper 0.0672*** 0.0672*** 0.1058*** 0.1060***
(0.0137) (0.0137) (0.0154) (0.0155)
I(exper2) -0.0024*** -0.0024*** -0.0047*** -0.0047***
(0.0008) (0.0008) (0.0007) (0.0007)
married 0.1083*** 0.1083*** 0.0640*** 0.0635***
(0.0157) (0.0157) (0.0168) (0.0168)
union 0.1825*** 0.1825*** 0.1061*** 0.1053***
(0.0172) (0.0172) (0.0179) (0.0179)
--------------------------------------------------------------------------------
Observations 4,360 4,360 4,360 4,360
================================================================================
Note: *p<0.1; **p<0.05; ***p<0.01
>summary(Pooled.OLS.biglm)
Large data regression model: bigglm(formula, data = wagepan, chunksize = 10, sandwich = TRUE)
Sample size = 4360
failed to converge after 15 iterations
Coef (95% CI) SE p
(Intercept) 0.0921 -0.0666 0.2507 0.0793 0.2458
educ 0.0913 0.0808 0.1019 0.0053 0.0000
black -0.1392 -0.1882 -0.0902 0.0245 0.0000
hisp 0.0160 -0.0234 0.0554 0.0197 0.4158
exper 0.0672 0.0407 0.0938 0.0133 0.0000
I(exper^2) -0.0024 -0.0039 -0.0009 0.0008 0.0017
married 0.1083 0.0778 0.1387 0.0152 0.0000
union 0.1825 0.1500 0.2150 0.0163 0.0000
factor(year)1981 0.0583 -0.0057 0.1224 0.0320 0.0686
factor(year)1982 0.0628 -0.0055 0.1310 0.0341 0.0659
factor(year)1983 0.0620 -0.0107 0.1347 0.0364 0.0882
factor(year)1984 0.0905 0.0084 0.1725 0.0410 0.0275
factor(year)1985 0.1092 0.0215 0.1970 0.0439 0.0128
factor(year)1986 0.1420 0.0478 0.2361 0.0471 0.0026
factor(year)1987 0.1738 0.0759 0.2718 0.0490 0.0004
Sandwich (model-robust) standard errors
One can easily see that "Pooled.OLS.biglm" is similar to "Pooled.ols" and "Pooled.OLS.biglm" (except for p). That's good.
How can I make random effects estimation using biglm? Or, if this is not possible, how can I use ff package to be able to deal with plm and large datasets?