0

So I'm continuously getting the same error...

Error in `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]) : 
  contrasts can be applied only to factors with 2 or more levels

... when I run the following...

set.seed(707)

train.control <- trainControl(method = "cv", number = 10)
heart.train.cca <- na.omit(heart.train)

model <-train(diabetes~., data=heart.train.cca, method="rpart", trControl=train.control) 

I saw a previous thread on here (Error in `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]): contrasts can be applied only to factors with 2 or more levels) how one of variables needs to have at least 2 levels so I saw that I forgot to levels() one of my variables. Re-ran it and got the same issue.

Here are the variables for context:

> str(heart)
'data.frame':   299 obs. of  11 variables:
 $ anaemia                 : Factor w/ 2 levels "0","1": 1 1 1 2 2 2 2 2 1 2 ...
 $ creatinine_phosphokinase: int  582 7861 146 111 160 47 246 315 157 123 ...
 $ diabetes                : Factor w/ 2 levels "0","1": 1 1 1 1 2 1 1 2 1 1 ...
 $ ejection_fraction       : int  20 38 20 20 20 40 15 60 65 35 ...
 $ high_blood_pressure     : Factor w/ 2 levels "0","1": 2 1 1 1 1 2 1 1 1 2 ...
 $ platelets               : num  265000 263358 162000 210000 327000 ...
 $ serum_creatinine        : num  1.9 1.1 1.3 1.9 2.7 2.1 1.2 1.1 1.5 9.4 ...
 $ serum_sodium            : int  130 136 129 137 116 132 137 131 138 133 ...
 $ sex                     : Factor w/ 2 levels "0","1": 2 2 2 2 1 2 2 2 1 2 ...
 $ smoking                 : Factor w/ 2 levels "0","1": 1 1 2 1 1 2 1 2 1 2 ...
 $ DEATH_EVENT             : Factor w/ 2 levels "0","1": 2 2 2 2 2 2 2 2 2 2 ...

Data is from: https://www.kaggle.com/andrewmvd/heart-failure-clinical-data

smci
  • 32,567
  • 20
  • 113
  • 146
Adam
  • 1
  • Note that a factor variable can have multiple levels, but still have only one level appearing in the data. What gets returned if you run `table(heart$DEATH_EVENT)`? – Phil Dec 09 '20 at 23:47
  • `table(heart$DEATH_EVENT) 0 1 203 96` – Adam Dec 10 '20 at 03:18
  • Your title is a red herring and won't attract answers. The issue is about contrasts on factors, not about the type of model (CART) or the 10-fold cross validation. Please edit the title to state the actual issue. – smci Dec 10 '20 at 03:28
  • Simple tip for debugging model-building: start by using one or two columns. Add columns one-at-a-time until you find the problem column. **In your case, it'll likely be that one of your factor columns contains NAs.** It's your debugging job to find out which column. This question is a duplicate. – smci Dec 10 '20 at 03:33
  • Hi @Adam, I looked at the data, it is not very clear how you derive the train dataset there. If I try to factor the columns like you did, I don't get the error. So I suspect there is something wrong with how you are processing the dataset, and that's not very apparent from your question – StupidWolf Dec 10 '20 at 04:08
  • If I do ```heart = read.csv("heart_failure_clinical_records_dataset.csv") ; heart = heart[,c(2:11,13)] ; heart$diabetes = factor(heart$diabetes) ; train(diabetes~., data=heart, method="rpart",trControl=train.control)``` it works pretty ok – StupidWolf Dec 10 '20 at 04:10

0 Answers0