I am new to R and I am doing some logistics regression model. I am trying to run bigglm against my data of 2M records with 100+ variables. My variables are composed of numeric and integers (0/1) as I have set it as indicators e.g.
isOK,quantity,weight,isUS,isEU,isASIA
0,2,1.1,0,0,1
1,1,0.9,1,1,0
However, bigglm always throw an error
Error in coef.bigqr(object$qr) : NA/NaN/Inf in foreign function call (arg 3)
From traceback(), it shows the following
14: coef.bigqr(object$qr)
13: coef(object$qr)
12: coef.biglm(iwlm)
11: coef(iwlm)
10: bigglm.function(formula = formula, data = datafun, ...)
9: bigglm(formula = formula, data = datafun, ...)
8: bigglm(formula = formula, data = datafun, ...)
7: bigglm.data.frame(myForm, data = myraw.data[i, , drop = FALSE],
family = binomial(link = logit))
6: bigglm(myForm, data = myraw.data[i, , drop = FALSE], family = binomial(link = logit))
5: bigglm(myForm, data = myraw.data[i, , drop = FALSE], family = binomial(link = logit)) at trial.r#48
4: eval(ei, envir)
3: eval(ei, envir)
2: withVisible(eval(ei, envir))
1: source("trial.r")
I have done some research and it was mentioned that bigglm should have all the possible values/factors in the chunk, however, all my variables are numeric/indicator and I think this is not necessary (please correct me if I'm mistaken). Anyway, I have already rearranged my data set in such a way that the first chunk (for my case, I set it as 3000 as per below), all integer variables has records where it is 0 or 1.
for (i in chunk(myraw.data, by=3000)){
if (i[1]==1){
myFullLRModel <- bigglm(myForm, data=myraw.data[i,,drop=FALSE], family=binomial(link=logit))
}else{
myFullLRModel <- update(myFullLRModel, myraw.data[i,,drop=FALSE])
}
}
Would you be able to advise on why the said error is occurring? I cannot run glm as it always returns insufficient memory.