R: Error in for loop - predicting multiple columns Random forest

Question

I am using a for loop, and i need to predict multiple columns and store them the same time.

cols is a vector containing all the columns i need to predict, mat is data.frame (my text features basically).

df is main dataframe having text, and prediction columns.

for (colm in cols){
  label <- as.factor(df[[colm]])
  dfm <- mat
  dfm[[colm]] <- label

  #Boruta(as.factor(colm)~., data=dfm, pValue = 0.01, mcAdj = TRUE, maxRuns = 20,
  #       doTrace = 2, holdHistory = TRUE, getImp = getImpRfZ) -> Bor.rf
  #dfm <- as.data.frame(as.matrix(dfm[,getSelectedAttributes(Bor.rf)]))
  #dfm[[colm]] <- label

  #train the RF model
  modelRF.bor <- train(colm~., data=dfm, method="rf", trControl=control)

  pred.RF.bor = predict(modelRF.bor, newdata = dfm[ ,!(colnames(dfm) == st(colm))])
  print("Predictions for Column")
  print(colm)
  print(pred.RF.bor)

  table(pred.RF.bor,dfm$colm)
  acc.RF.bor = mean(pred.RF.bor==dfm$colm)
  print("Accuracy ")
  print(acc.RF.bor)
  print("Confusion Matrix")
  print(confusionMatrix(table(pred.RF.bor,dfm$colm)))

  output[,i] <- pred.RF.bor
  i = i+1
}

I am getting this error, and have checked everything in my code, and also similar questions here.

Error in model.frame.default(form = colm ~ ., data = dfm, na.action = na.fail) : 
  variable lengths differ (found for 'excel')

Don't know what to do, please suggest.

Please provide a [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). The problem is maybe due to teh presence of missing values ? — Gilles San Martin, Feb 18 '18 at 20:19
@Gilles , there is no missing missing values, this is because of accessing columns, as while training, I am using formula. colm~., data=dfm, method="rf", trControl=control here, model is not able to see colm, but whe I check outside it is present as the exact name which colm refers to inloop, so I think there it is. I also did str(colm) etc. but nothing — Shivam, Feb 18 '18 at 20:22
OK but without a reproducible example it will be very difficult to help... — Gilles San Martin, Feb 18 '18 at 20:25
but, you want the data and whole code, I dont think, that is needed, because I just need to know, how to adress the colms in formula, as I am using a for loop. — Shivam, Feb 18 '18 at 20:27
I am not sure if I fully understand your question, but I suspect you're looking for something like this: https://stackoverflow.com/questions/48681041/r-loop-multiple-linear-regression-models-exclude-1-variable-at-a-time/ - check out my answer for how one can use `as.formula` in R. — kgolyaev, Feb 18 '18 at 20:37
that was helpful @kgolyaev, but you are fitting different model, here I am using similar models but I am accessing the columns to predict using a for loop. if I have this situation, how do you think, as.formula can be used. I have not used it before. — Shivam, Feb 18 '18 at 20:45
@Shivam would you be able to clarify what you seek to do with a tiny data example? I don't understand what you mean by 'predict multiple columns and store them at the same time'. My understanding of random forest is that it can only predict one column at a time. If you have multiple columns to predict, you need to train separate RF models, one per target column. — kgolyaev, Feb 18 '18 at 20:50
@kgolyaev, that is what I am doing in the example, multiple models will be trained for each column and the predictions will be also be saved. Where I am stuck is this error, which is not ale to recognize from the line where I fits first model. — Shivam, Feb 18 '18 at 20:52
Gotcha. In this case, according to the error message, there seems to be something wrong with the `"excel"` column. Try checking if there are missing values in it. And, in general, edit your post to add the output of these two commands: `str(df)` and `summary(df)`. — kgolyaev, Feb 18 '18 at 20:58
@kgolyaev, I have attached my script and dataset, str() and summary() output is bigger, to paste here, 82 column in there, one id, text and 72 variables to predict. — Shivam, Feb 18 '18 at 21:11
@Shivam, I do not think it is fair to expect people to go through your entire data. Stackoverflow etiquette requires people to do some work before asking questions. Please familiarize yourself with this: https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example - 99% of the time you will find your problem in the process of constructing reproducible example. The remaining 1% is what SO questions are for. — kgolyaev, Feb 18 '18 at 21:19
the dataset is of a few kilobytes, to save your time and confusion, I thought to share link, I am familiar with that already. — Shivam, Feb 18 '18 at 21:23

R: Error in for loop - predicting multiple columns Random forest

0 Answers0