I've spent hours reading for using ff package and couldn't get a handle on this topic yet. Basically, I'd like to run a analysis on a big data and save the results/statistics from the analysis.
I modified the example code written in ff package using biglm on my data set. http://cran.r-project.org/web/packages/ff/ff.pdf The problem is very similar to this one Modeling a very big data set (1.8 Million rows x 270 Columns) in R
Here's my code below
library(ff)
library(ffbase)
library(doSNOW)
registerDoSNOW(makeCluster(4, type = "SOCK"))
memory.limit(size=32000)
setwd('Z:/data')
wd <- getwd()
data.path <- file.path(wd,'ffdb')
data.path.train <- file.path(data.path,'train')
ff.train <- read.table.ffdf(file='train.tsv', sep='\t')
save.ffdf(ff.train, dir=data.path.train)
library(biglm)
# Here I'm implementing biglm model on ffdf data
# Vi represents the column names
form <- V27 ~ V3 + V4 + V5 + V6 + V7 + V8 + V9 + V10 + V11 + V12 + V13 + V14 + V15
ff.biglm <- for (i in chunk(ff.train, by=500)){
if (i[1]==1){
message("first chunk is: ", i[[1]],":",i[[2]])
biglmfit <- biglm(form, data=ff.train[i,,drop=FALSE])
}else{
message("next chunk is: ", i[[1]],":",i[[2]])
biglmfit <- update(biglmfit, ff.train[i,,drop=FALSE])
}
}
When the above code is ran, it gives the following error message:
first chunk is: 1:494 Error: cannot allocate vector of size 19.4 Gb In addition: There were 50 or more warnings (use warnings() to see the first 50)
Is this error message in regards to the size of biglmfit cannot be fitting to memory? Any work around to save biglmfit into ffdf data type? Or for that matter, is there any ways to store analysis statistics into ffdf type in chunk? Thank you.
EDIT:
vmode(ff.train)
V1 V2 V3 V4 V5 V6 V7 V8 V9
V10
"integer" "integer" "integer" "integer" "integer" "integer" "integer" "integer" "integer" "integer"
V11 V12 V13 V14 V15 V16 V17 V18 V19 V20
"integer" "integer" "integer" "integer" "integer" "integer" "integer" "integer" "integer" "integer"
V21 V22 V23 V24 V25 V26 V27
"integer" "integer" "integer" "integer" "integer" "integer" "integer"