I have not used biglm
package before. Based on what you said, you ran out of memory when calling predict
, and you have nearly 7,000,000 rows for new dataset.
To resolve the memory issue, prediction must be done chunk-wise. For example, you iteratively predict 20,000 rows at a time. I am not sure whether the predict.bigglm
can do chunk-wise prediction.
Why not have a look at mgcv
pacakage? bam
can fit linear models / generalized linear models / generalized additive models, etc, for large data set. Similar to biglm
, it performs chunk-wise matrix factorization when fitting model. But, the predict.bam
supports chunk-wise prediction, which is really useful for your case. Furthermore, it does parallel model fitting and model prediction, backed by parallel
package [use argument cluster
of bam()
; see examples under ?bam
and ?predict.bam
for parallel examples].
Just do library(mgcv)
, and check ?bam
, ?predict.bam
.
Remark
Do not use nthreads
argument for parallelism. That is not useful for parametric regression.