0

Here is the code:

data_complete<-read.delim("D:/Work/output_java_head.txt") #complete data set #working
modelfn<-function(data_complete){
  model<-lm(mctr~price+age_group+gender+brand+product_typeid+google_product_category,data=data_complete)
  data_complete$predicted<-predict(model,data_complete) 
  return(data_complete$predicted)
  sink()
  write.csv("D:/Work/output",i,".csv")
  rm(model)
  gc(TRUE)
} #working

Then using this command:

by(data_complete,data_complete$google_product_category,modelfn)

I got this error:

Error in contrasts<-(*tmp*, value = contr.funs[1 + isOF[nn]]) : contrasts can be applied only to factors with 2 or more levels

Please note that there are 117 levels in data_complete$google_product_category and in all there are 22 columns in data_complete.

Also I used for testing:

by(data_complete,data_complete$google_product_category,summary)

It gave me right answer.

So I guess there is some problem in the function modelfn I have created.

lmo
  • 37,904
  • 9
  • 56
  • 69
heybhai
  • 77
  • 2
  • 9
  • Pretty straightforward error: you have a variable with only level. I imagine it could be due to missingness, but it's impossible to know without looking at the data. – Thomas Apr 15 '14 at 09:04
  • @Thomas as I have mentioned above there are 117 levels for the variable I am modelling it and I have tested it using "levels" and "summary" in R console. As for the data part I can't share it as it is confidential. But I can tell you data has been thoroughly cleaned and tested several times. – heybhai Apr 15 '14 at 09:15

1 Answers1

3

I'm not sure if you believed me when I wrote my comment, but this is a very straightforward error related to the fact that one of your variables has only one observed level. Here's a simple example to demonstrate it:

> x <- factor(rep(1,100), levels=1:20)
> y <- rnorm(100)
> lm(y~x)
Error in `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]) : 
  contrasts can be applied only to factors with 2 or more levels

Make sure your data actually look the way you think they do. Even if there are multiple factor levels attached to a variable, it's likely that the variable lacks actual observations at more than one level. Again, we can't really help you if you can't share the data, so you'll have to look for yourself for where this is occurring.

Thomas
  • 43,637
  • 12
  • 109
  • 140
  • Hey Thomas when I used your example and used summary function for it as: **summary(x)** 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 100 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 which suggest that in other 19 levels there is no data or sufficient data to make a linear model and I guess in my dataset it might be the root cause of error.What do you think? – heybhai Apr 15 '14 at 10:42
  • Yes, that's exactly the error - even though there are levels for the factor, there are not any observations that take those levels. – Thomas Apr 15 '14 at 11:44
  • Thanks a lot @Thomas!!! Now can you suggest me how to counter this.I am thinking of 2 options: 1)Using a try-catch which I don't know how to implement 2)Skipping the faulty levels Am I going in right direction? – heybhai Apr 15 '14 at 11:56
  • @heybhai I'd suggest you create a small example dataset that creates the problem and post this as a new question about how to best handle it. – Thomas Apr 15 '14 at 12:02
  • Sure @Thomas I will post it today.Hopefully you will be there to sort it out. – heybhai Apr 15 '14 at 12:09