1

I'm trying to impute a large data set with the Amelia package. When calling the amelia function I get this error:

Error in `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]) : 
contrasts can be applied only to factors with 2 or more levels

Because I do not have factor variables with 1 level I started removing variable for variable to see which variable causes this problem. I tracked it down to this numerical variable, amelia works when I remove this single variable from the data set:

> str(train$ABC)
 num [1:1600] 5.19 5.38 5.59 5.26 5.12 ...

however there is nothing strange with this variable:

> summary(train$ABC)
 Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
4.533   5.166   5.328   5.434   5.557   7.914     610 

> summary(na.omit(train))

        ABC              ...
   Min.   :4.533   
   1st Qu.:5.196  
   Median :5.384   
   Mean   :5.512  
   3rd Qu.:5.668   
   Max.   :7.520

> var(train$ABC,na.rm=T)
     [1] 0.1969697

> aa <- na.omit(train)
> var(aa$ABC)
 [1] 0.2500173

I'm sorry I cannot provide the full data set as this is a medical study I'm working on and I could not make up a trivial example.

What else could cause this error? Where should I start looking? thanks.

spore234
  • 3,550
  • 6
  • 50
  • 76

1 Answers1

2

I had the same problem today and it was due to the many missing values.

Basically, even if you original data frame df contains only factors with >=2 levels, when you use df inside lm() the incomplete observations are dropped (at least the ones concerning your variables of interest). Thus it is not df you need to check for factors with <2 levels, but df[complete.cases(df),]. In that data frame, at least one factor variable will be left with only 1 level (check levels(droplevels(x))).

I guess you guys already solved it, but maybe it will help somebody else in the future!

jeiroje
  • 85
  • 9