0

I have a very large panel in R, tryed to perform a plm regression and received the error "cannot allocate vector of size 11 Gb". I found out that regression in chunks could be a solution and tryed to use biglm and/or ff packages.

My question is: can I use biglm and/or ff packages to make a random effects estimation?

Using Wooldridge data and the nice example given by Gilles San Martin, i used the following:

#install.packages(c("wooldridge", "plm", "stargazer", "lme4","biglm"), dependencies = TRUE)  library(wooldridge) 
library(plm) 
library(lme4)
library(biglm)

data(wagepan)
formula<-lwage ~ educ + black + hisp + exper+I(exper^2)+ married + union + factor(year)

First, i made the OLS regressions using three different packages:

Pooled.ols <- plm(formula, data = wagepan, index=c("nr","year"), model="pooling")

Pooled.ols.lm <- lm(formula, data = wagepan)

Pooled.OLS.biglm<- bigglm(formula,data=wagepan, chunksize=10, sandwich=TRUE)

Then I made the random effects regressions:

random.effects <- plm(formula, data = wagepan, index = c("nr","year") , model = "random") 

random.effects.lme4 <- lmer(lwage ~ educ + black + hisp + exper + I(exper^2) + married + 
                              union + factor(year) + (1|nr), data = wagepan) 

I was not able to put the biglm regression inside stargazer table, but i did the following:

stargazer::stargazer(Pooled.ols,Pooled.ols.lm,random.effects,random.effects.lme4, type="text",
                     column.labels=c("OLS pooled PLM","OLS Pooled LM","Random Effects PLM","Ra ndomEffects Lme4"), 
                     dep.var.labels = c("log(wage)"), keep.stat=c("n"),
                     keep=c("edu","bla","his","exp","marr","union"), align = TRUE, digits = 4)

summary(Pooled.OLS.biglm)
================================================================================
                                     Dependent variable:                        
             -------------------------------------------------------------------
                                          log(wage)                             
                 panel           OLS            panel              linear       
                 linear                         linear          mixed-effects   
             OLS pooled PLM OLS Pooled LM Random Effects PLM Random Effects Lme4
                  (1)            (2)             (3)                 (4)        
--------------------------------------------------------------------------------
educ           0.0913***      0.0913***       0.0919***           0.0919***     
                (0.0052)      (0.0052)         (0.0107)           (0.0108)      
                                                                                
black          -0.1392***    -0.1392***       -0.1394***         -0.1394***     
                (0.0236)      (0.0236)         (0.0477)           (0.0485)      
                                                                                
hisp             0.0160        0.0160           0.0217             0.0218       
                (0.0208)      (0.0208)         (0.0426)           (0.0433)      
                                                                                
exper          0.0672***      0.0672***       0.1058***           0.1060***     
                (0.0137)      (0.0137)         (0.0154)           (0.0155)      
                                                                                
I(exper2)      -0.0024***    -0.0024***       -0.0047***         -0.0047***     
                (0.0008)      (0.0008)         (0.0007)           (0.0007)      
                                                                                
married        0.1083***      0.1083***       0.0640***           0.0635***     
                (0.0157)      (0.0157)         (0.0168)           (0.0168)      
                                                                                
union          0.1825***      0.1825***       0.1061***           0.1053***     
                (0.0172)      (0.0172)         (0.0179)           (0.0179)      
                                                                                
--------------------------------------------------------------------------------
Observations     4,360          4,360           4,360               4,360       
================================================================================
Note:                                                *p<0.1; **p<0.05; ***p<0.01

>summary(Pooled.OLS.biglm)
Large data regression model: bigglm(formula, data = wagepan, chunksize = 10, sandwich = TRUE)
Sample size =  4360 
failed to converge after 15  iterations
                    Coef    (95%     CI)     SE      p
(Intercept)       0.0921 -0.0666  0.2507 0.0793 0.2458
educ              0.0913  0.0808  0.1019 0.0053 0.0000
black            -0.1392 -0.1882 -0.0902 0.0245 0.0000
hisp              0.0160 -0.0234  0.0554 0.0197 0.4158
exper             0.0672  0.0407  0.0938 0.0133 0.0000
I(exper^2)       -0.0024 -0.0039 -0.0009 0.0008 0.0017
married           0.1083  0.0778  0.1387 0.0152 0.0000
union             0.1825  0.1500  0.2150 0.0163 0.0000
factor(year)1981  0.0583 -0.0057  0.1224 0.0320 0.0686
factor(year)1982  0.0628 -0.0055  0.1310 0.0341 0.0659
factor(year)1983  0.0620 -0.0107  0.1347 0.0364 0.0882
factor(year)1984  0.0905  0.0084  0.1725 0.0410 0.0275
factor(year)1985  0.1092  0.0215  0.1970 0.0439 0.0128
factor(year)1986  0.1420  0.0478  0.2361 0.0471 0.0026
factor(year)1987  0.1738  0.0759  0.2718 0.0490 0.0004
Sandwich (model-robust) standard errors

One can easily see that "Pooled.OLS.biglm" is similar to "Pooled.ols" and "Pooled.OLS.biglm" (except for p). That's good.

How can I make random effects estimation using biglm? Or, if this is not possible, how can I use ff package to be able to deal with plm and large datasets?

0 Answers0