0

I do not understand why I am receiving this message:

Error in train(Medal ~ ., data = training1, method = "rf", ntree = 5) : unused arguments (data = training1, method = "rf", ntree = 5)

in my random forest algorithm. I have run a RF exactly like this before but am just now getting this error. Any ideas?

library(caret)
library(randomForest)

OlympicData <- read.csv(file.choose(), header = T)

#convert NAs in Medal column to NoMedal
levels <- levels(OlympicData$Medal)
levels[length(levels) + 1] <- "NoMedal"
OlympicData$Medal <- factor(OlympicData$Medal, levels = levels)
OlympicData$Medal[is.na(OlympicData$Medal)] <- "NoMedal"
summary(OlympicData)

#remove unnecessary columns
OlympicData <- OlympicData[, -1]
OlympicData <- OlympicData[,-1]
OlympicData <- OlympicData[, -7]
OlympicData <- OlympicData[, -6]
summary(OlympicData)

#remove remaining NAs
OlympicData[complete.cases(OlympicData),]

#train model
set.seed(33)
Data_Splitting <- createDataPartition(OlympicData$Medal, p=0.75, list=FALSE)
training1 = OlympicData[Data_Splitting,]
testing1 = OlympicData[-Data_Splitting,]

rf <- train(Medal ~., data = training1, method = "rf", ntree = 5)
TFurrer
  • 1
  • 2
  • ntree is wrong parameter in train function using rf method. mtry is working. – Sang won kim Nov 20 '19 at 04:57
  • That is not what I have been taught nor does it solve my error – TFurrer Nov 20 '19 at 04:59
  • Sry, mtry (Randomly Selected Predictors) . – Sang won kim Nov 20 '19 at 05:00
  • Please refer this document. http://topepo.github.io/caret/train-models-by-tag.html#Random_Forest – Sang won kim Nov 20 '19 at 05:03
  • Error and code don't match. The error suggests a model `train(training1 ~ ., data = training1, method = "rf", ntree = 5)` whereas your code has `train(Medal ~., data = training1, method = "rf", ntree = 5)`. Notice the difference in the `formula`. Is this a typo? For what it's worth @Sangwonkim, having an `ntree` parameter is absolutely fine; see e.g. `train(Species ~ ., data = iris, method = "rf", ntree = 5)`. – Maurits Evers Nov 20 '19 at 05:03
  • Definitely a typo, thanks for catching that. Error says train(Medal ~. .... – TFurrer Nov 20 '19 at 05:06
  • @TFurrer In that case your question is not reproducible without minimal sample data. All I can say is that based on the `iris` sample data, `train(Species ~ ., data = iris, method = "rf", ntree = 5)` works as expected. Please add [reproducible & minimal sample data](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) (see the link for details how to do that). – Maurits Evers Nov 20 '19 at 05:10
  • PS. Could there be a name conflict of two different `train` functions? Can you try with `caret::train(Medal ~., data = training1, method = "rf", ntree = 5)`? – Maurits Evers Nov 20 '19 at 05:14
  • @MauritsEvers I used the athlete_events.csv file from the 120 Years of Olympic Data dataset from Kaggle https://www.kaggle.com/heesoo37/120-years-of-olympic-history-athletes-and-results – TFurrer Nov 20 '19 at 05:15
  • @MauritsEvers caret::train worked! Thank you! – TFurrer Nov 20 '19 at 05:21
  • @Tfurrer So there must've been a name conflict. Perhaps you defined your own `train` function somewhere and it's still in your global environment due to a resumed R session; or perhaps it's something in your `.Rprofile`. Either way and for future posts, always include minimal sample data. Linking to a Kaggle dataset is not a good idea, as it requires us to sign up and go throw the data download process. I can guarantee that that will put off a lot of people. Think about making it as easy for others to help. Include minimal but representative sample data in your post. – Maurits Evers Nov 20 '19 at 05:24

0 Answers0