My dataset is too big/formula too complicated to run biglm
, fastLm
, speedlm
or lm
in one go. Therefor I'm down to splitting up my dataset in smaller pieces and performing an update
for every 50.000 rows.
A simplified version of what I'm using. Replacing the iris dataset by my own.
library(speedglm)
chunk1 <- iris[1:10,]
chunk2 <- iris[11:20,]
chunk3 <- iris[21:30,]
lmfit <- speedlm(Sepal.Length ~ Sepal.Width + Species, chunk1)
for (i in list(11,20, 21:30)){
lmfit2 <- updateWithMoreData(lmfit, iris[i,])
}
lmfit2
Splitting up the model gets me the following error:
Error in `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]) :
contrasts can be applied only to factors with 2 or more levels
- Changing the formula is not an option, as each effect is relevant.
- Making the 'smaller pieces' bigger is not an option, as the dataset will get too big and slow down performance
- I have no clue which columns are erroneous, it may also differ at times which columns will give this error.
What are my options?