2

I am running regressions with large samples and many covariates, resulting in lm objects of around 10Gb each (and I need to run dozens of regressions). I want to save the regression outputs and later import them and create tables using Stargazer. I have been doing this in the following way:

#Fake dataset
set.seed(1)
dataset<- data.frame(
x = rnorm(100),
z = rnorm(100),
w = rnorm(100),
y = rnorm(100) + 2*x + 3*w,
)

#Running regressions and storing them
reg1<-lm(y ~ x + z, data=dataset)
saveRDS(reg1, "reg1.rds")
rm(reg1)

reg2<-lm(y ~ x + w, data=dataset)
saveRDS(reg2, "reg2.rds")
rm(reg2)

#Later, I decide which models to report and export outputs tables using Stargazer
reg1<-read_rds("reg1.rds")
reg2<-read_rds("reg2.rds")
stargazer(reg1,reg2, type="text")

Here you find the output of Stargazer in this example (sorry, I am a new member and Stack Overflow does not allow me to embed images in my posts)

The problem with this is that the rds files are too large, occupying a lot of memory space from my HD. The problem continues even when I set the option model=FALSE in the lm() function. Is there another way to do what I have been doing without using a lot of disk space?

  • 1
    If all you want to keep are coefficients and standard errors, then `coef(reg1)` and `vcov(reg1)` will extract just info you need. Note that `reg1` is a list that presumably has a lot more in it than you need. Another alternative would be to remove/delete the large items from each list that you don't need, e.g., `reg1$residuals <- NULL`. – DanY Nov 06 '18 at 17:24
  • 1
    Another way to save space is to only write `summary(reg1)$coefficients` to disk. This is a dataframe with columns `'Estimate'`, `'Std. Error'`, `'t value'` and `'Pr(>|t|)'`. – Rui Barradas Nov 06 '18 at 17:26
  • @DanY I know I can store the regression outputs in that way, but the problem is that if I do not know how to use Stargazer later to create a table with the results stored in that way. – Gabriel Oliva C. Cunha Nov 06 '18 at 17:37
  • @RuiBarradas This is a nice way to store the results. However, if use summary(reg1)$coefficients as input to stargazer, it will print the whole dataframe, not create a regression table and print the reg1 outcomes in the first column. – Gabriel Oliva C. Cunha Nov 06 '18 at 17:46

1 Answers1

1

The advice to store the summary results is actually on point for your goal. What you also needed to know was exactly what values stargazer was taking from a model.object. That's not actually described in much detail in the help pages, but its fairly obvious once you look at what the code it doing. Here's the top of the core function used by stargazer. You might be able to see it if your console stores enough lines of code (but my Rstudio installation does not, so I viewed it in an editor after downloading the package from CRAN and unpacking):

stargazer:::.stargazer.wrap  # scrolls off the top of my console
# cut from stargazer-internal.R
.stargazer.wrap <-
  function(..., type, title, style, summary, out, out.header, covariate.labels, column.labels, column.separate, 
           dep.var.caption, dep.var.labels, dep.var.labels.include, align, coef, se, t, p, t.auto, 
           p.auto, ci, ci.custom, ci.level, ci.separator, add.lines, apply.coef, apply.se, apply.t, apply.p, apply.ci,
           colnames,
           column.sep.width, decimal.mark, df, digit.separate, digit.separator, digits, digits.extra, 
           flip, float, 
           float.env, font.size, header, initial.zero, intercept.bottom, intercept.top, keep, keep.stat, 
           label, model.names, model.numbers, multicolumn, no.space, notes, notes.align, notes.append, 
           notes.label, object.names, omit, omit.labels, omit.stat, omit.summary.stat, omit.table.layout,
           omit.yes.no, order, ord.intercepts, perl, report, rownames,
           rq.se, selection.equation, single.row, star.char, star.cutoffs, suppress.errors, 
           table.layout, table.placement, 
           zero.component, summary.logical, summary.stat, nobs, mean.sd, min.max, median, iqr, warn) {

  .add.model <-
  function(object.name, user.coef=NULL, user.se=NULL, user.t=NULL, user.p=NULL, auto.t=TRUE, auto.p=TRUE, user.ci.lb=NULL, user.ci.rb=NULL) {

    if (class(object.name)[1] == "Glm") {
        .summary.object <<- summary.glm(object.name)
    }
    else if (!(.model.identify(object.name) %in% c("aftreg", "coxreg","phreg","weibreg", "Glm", "bj", "cph", "lrm", "ols", "psm", "Rq"))) {
      .summary.object <<- summary(object.name)
    }
    else {
      .summary.object <<- object.name
    }

So all you need to do in order to trick stargazer is change the class of the summary object contents to the class of the original model.

(saving this and will return with example code.)

Ooops. I went back to your question to set up my tested code but sadly ... it doesn't have a [MCVE]. I would have added code here to accomplish the goal but I generally reserve that service for question with complete examples. You should refer to How to make a great R reproducible example and edit your question if this is not already sufficient.

IRTFM
  • 258,963
  • 21
  • 364
  • 487
  • Thanks for your help. I updated the post and now I think it is reproducible. I am not sure how I can change the class of the object to lm in a way that is compatible with Stargazer. Could you please help me with that? – Gabriel Oliva C. Cunha Nov 06 '18 at 19:13