0

My dataset is too big/formula too complicated to run biglm, fastLm, speedlm or lm in one go. Therefor I'm down to splitting up my dataset in smaller pieces and performing an update for every 50.000 rows.

A simplified version of what I'm using. Replacing the iris dataset by my own.

library(speedglm)
chunk1 <- iris[1:10,]
chunk2 <- iris[11:20,]
chunk3 <- iris[21:30,]
lmfit  <- speedlm(Sepal.Length ~ Sepal.Width + Species, chunk1)

for (i in list(11,20, 21:30)){
  lmfit2 <- updateWithMoreData(lmfit, iris[i,])
}
lmfit2

Splitting up the model gets me the following error:

Error in `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]) : 
contrasts can be applied only to factors with 2 or more levels
  • Changing the formula is not an option, as each effect is relevant.
  • Making the 'smaller pieces' bigger is not an option, as the dataset will get too big and slow down performance
  • I have no clue which columns are erroneous, it may also differ at times which columns will give this error.

What are my options?

Bas
  • 1,066
  • 1
  • 10
  • 28
  • Split the data differently. `iris[1:10,]` contains only one `Species`. – Roland Oct 14 '15 at 07:11
  • @Ronald I have no clue which columns are erroneous, it may also differ at times which columns will give this error. I also have this problem for a whole bunch of different formulas so figuring out how to split each column differently is not really an option. There can also be multiple factors with only one level at a time. – Bas Oct 14 '15 at 07:22
  • 1
    Have you factored the data before you split it? Otherwise you may get different levels in each split, which will cause the error. – JohannesNE Oct 14 '15 at 07:46
  • @JohannesNE No, I've basically got my code as in the example above, will read up about factoring – Bas Oct 14 '15 at 07:54
  • @JohannesNE I have been trying to work out how to factor the data, without success so far.. I have made a [simplified example here](http://stackoverflow.com/questions/33143257/factoring-for-linear-models-create-lm-with-one-factor). Hopefully you can help me out:) – Bas Oct 15 '15 at 10:04
  • 1
    Se my answer in your new post. – JohannesNE Oct 15 '15 at 10:28

0 Answers0