biglm predict unable to allocate a vector of size xx.x MB

Question

I have this code:

library(biglm)
library(ff)

myData <- read.csv.ffdf(file = "myFile.csv")
testData <- read.csv(file = "test.csv")
form <- dependent ~ .
model <- biglm(form, data=myData)
predictedData <- predict(model, newdata=testData)

the model is created without problems, but when I make the prediction... it runs out of memory:

unable to allocate a vector of size xx.x MB

some hints? or how to use ff to reserve memory for predictedData variable?

model$rank gives me a NULL value, and I have 6999251 rows – antonio Jul 01 '16 at 18:10 — antonio, Jul 01 '16 at 18:10

Zheyuan Li · Answer 1 · 2016-07-01T18:36:59.227

I have not used biglm package before. Based on what you said, you ran out of memory when calling predict, and you have nearly 7,000,000 rows for new dataset.

To resolve the memory issue, prediction must be done chunk-wise. For example, you iteratively predict 20,000 rows at a time. I am not sure whether the predict.bigglm can do chunk-wise prediction.

Why not have a look at mgcv pacakage? bam can fit linear models / generalized linear models / generalized additive models, etc, for large data set. Similar to biglm, it performs chunk-wise matrix factorization when fitting model. But, the predict.bam supports chunk-wise prediction, which is really useful for your case. Furthermore, it does parallel model fitting and model prediction, backed by parallel package [use argument cluster of bam(); see examples under ?bam and ?predict.bam for parallel examples].

Just do library(mgcv), and check ?bam, ?predict.bam.

Remark

Do not use nthreads argument for parallelism. That is not useful for parametric regression.

score 0 · Answer 2 · answered Jul 01 '16 at 17:54

0

Here are the possible causes and solutions:

Cause: You're using 32-bit R

Solution: Use 64-bit R
Cause: You're just plain out of RAM

Solution: Allocate more RAM if you can (?memory.limit). If you can't then consider using ff, working in chunks, running gc(), or at worst scaling up by leveraging a cloud. Chunking is often the key to success with Big Data -- try doing the projections 10% at a time, saving the results to disk after each chunk and removing the in-memory objects after use.
Cause: There's a bug in your code leaking memory

Solution: Fix the bug -- this doesn't look like it's your case, however make sure that you have data of the expected size and keep an eye on your resource monitor program to make sure nothing funny is going on.

answered Jul 01 '16 at 17:54

Hack-R

22,422
14
75
131

I have version 3.3.0 (2016-05-03) -- "Supposedly Educational" Platform: x86_64-pc-linux-gnu (64-bit). Do you know how to reserve data with ff for the predicted vector? – antonio Jul 01 '16 at 18:11
@antonio What does your system report in terms of total RAM, free RAM right before you try to do this, free storage disk space, and how large does it say the vector is that it tries and fails to allocate? Let me ask that first because if it's xx.x MB and not xxxx.xx MB and you're on 64-bit then you may just need to increase memory allocation to R, etc. – Hack-R Jul 01 '16 at 18:22

score 0 · Answer 3 · answered Aug 12 '16 at 23:57

0

I've tryed with biglm and mgcv but memory and factor problems came quickly. I have had some success with: h2o library.

answered Aug 12 '16 at 23:57

antonio

477
7
18

biglm predict unable to allocate a vector of size xx.x MB

3 Answers3

Linked