2

I am trying to run a beta regression model in R using the betareg package. However, I am surprisingly running into memory size issue. Let me explain why this surprises me.

I am using Windows 7, 64 Bit, R-64, and have 32 GB in RAM.

The betareg command I am running is::

br1 <- betareg(dfp ~ ago + evl + spe + low + poly(fpt, 4, raw = T), data = tt[tt$zero_one_ind == 1, ], model = T, y = F, x = F)

The object size of the betareg model is:

print(object.size(br1), units = "Mb") 46 Mb

This is the error message I am receiving:

Error: cannot allocate vector of size 344.1 Gb
In addition: Warning messages:
1: In diag(x %*% xwx1 %*% t(x)) :
Reached total allocation of 32693Mb: see help(memory.size)
2: In diag(x %*% xwx1 %*% t(x)) :
 Reached total allocation of 32693Mb: see help(memory.size)
3: In diag(x %*% xwx1 %*% t(x)) :
 Reached total allocation of 32693Mb: see help(memory.size)
4: In diag(x %*% xwx1 %*% t(x)) :
Reached total allocation of 32693Mb: see help(memory.size)

The betareg model successfully ran in R and estimated the coefficients, and as far as I can tell all of the slots are filled, but it looks like R is unable to construct the variance covariance matrix. Any pointers to what is going wrong here?

user1738753
  • 626
  • 4
  • 12
  • 19
  • Isn't what's going wrong very clear? R ran out of memory. What exactly is your question? – joran Oct 14 '14 at 19:50
  • @joran with a dataset that is 1 million observation. It does not seem likely that we need 344.1 GB to build a var-cov matrix. I could be wrong, but seems a little bit strange. – user1738753 Oct 14 '14 at 19:54
  • Strange or not, it is what it is. Without a detailed understanding of the precise code involved in constructing the matrix, I'm not sure how you'd have any intuition one way or another. Perhaps the betareg code isn't as efficient as it could be, or perhaps that's just the nature of the calculations involved. Either way, there isn't much you can do other than get more ram or fit a smaller/more efficient model. – joran Oct 14 '14 at 20:03
  • 2
    are some of your predictor variables factors, with large numbers of levels? You can easily generate a giant model matrix that way. In principle sparse model matrices can be constructed, but I don't know if `betareg` has those capabilities. (What are the dimensions and object size of `model.matrix(~ ago + evl + spe + low + poly(fpt, 4, raw = T), data= ...)` ?) – Ben Bolker Oct 14 '14 at 20:46
  • @BenBolker I am currently not in the office and will double check, whether conversion to factor variables happened along the way. I assume that this is not the case, but will check. The model.matrix should have 1,000,000 rows and 8 columns. I have been able to successfully run gam and gamlss (with smoothing terms) on the same dataset. The specifications for these models are much more complex. I am really shocked that betareg needs to 344.1 GB to build the var-cov matrix that seems to be a lot. – user1738753 Oct 15 '14 at 16:59

2 Answers2

5

I had the same problem, the solution is quite simple.

From the manual:

Note that the default residuals "sweighted2" might be burdensome to compute in large samples and hence might need modification in such applications.

You could for instance use one of the other options in the summary:

type = c(“pearson”, “deviance”, “response”, “weighted”, “sweighted”, “sweighted2”)

  • Can you change that in the betareg() call directly so the model object already has the "lighter" residuals? Or how can you otherwise make it work in stargazer? – jlp Jan 13 '22 at 03:17
1

I had same problem using betareg. I was only interested in p-values for coefficient significance so this worked for me as a work-around:

fit_frst_spnd_model <- betareg(formula = frst_spnd_util_pc2 ~ .
                        ,data = train_data_frst_txn2_2
                        )
library(lmtest)
coeftest(fit_frst_spnd_model)
Suraj Rao
  • 29,388
  • 11
  • 94
  • 103