-2

"student.por" is a dataset that I trimmed down to 648 rows so I could do a four fold cross validation. Here is a link to the csv file if you would like to see it. student.por

"predictorDat1" is the same data set with just my predictor variables. I removed my chosen response variable "romantic" which is column 23.

student.por=student.por[-649,]

predictorDat1<-student.por[,-23]

g1=1:162
g2=163:324
g3=325:486
g4=487:648
Groups=data.frame(g1,g2,g3,g4)

Now when I run the code below I get this error:

"Error in model.frame.default(formula = student.por$romantic ~ ., data = predictorDat1[-Groups[, : variable lengths differ (found for 'school')"

predictions=c()
for(i in 1:4){
  tree=rpart(student.por$romantic~., data=predictorDat1[-Groups[,i],],control=rpart.control(cp=.001)) 
  predictions_per_fold=predict(tree,type="class",newdata=predictorDat1[Groups[,i],]) ## 
  predictions=c(predictions,as.character(predictions_per_fold))
}

Does anyone know why I'm having this issue? I would be so grateful for the help.

Kai
  • 59
  • 5
  • It's easier to help you if you include a simple [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input and desired output that can be used to test and verify possible solutions. – MrFlick May 18 '20 at 04:56
  • Sorry I forgot to link my data set. I included the link now so it should make a lot more sense what i'm working with. – Kai May 18 '20 at 05:26
  • 1
    Data should be included in the question itself to make it reproducible. It should not be on external sites that require additional registration. Also, we don’t need your actual data. You can creak fake data as long as it triggers the same error. – MrFlick May 18 '20 at 05:27
  • The thing is it worked for other data that I tried it just won't work for this one. That's why I decided to just link the dataset and maybe someone understands why that one specifically is not working with my code while others do. – Kai May 18 '20 at 05:35

1 Answers1

1

This error usually means that your left hand side of the formula is not the same length as the right hand side. Indeed you forgot to subset your dependent variable student.por$romantic by your grouping variable.

The following fixed it for me:

for(i in 1:4){
  tree=rpart(student.por$romantic[-Groups[,i]]~., data=predictorDat1[-Groups[,i],],control=rpart.control(cp=.001)) 
  predictions_per_fold=predict(tree,type="class",newdata=predictorDat1[Groups[,i],]) ## 
  predictions=c(predictions,as.character(predictions_per_fold))
}

Hope this answers your quetsion.

Ahorn
  • 3,686
  • 1
  • 10
  • 17