1

following is a piece of code that I am implementing:

train.data <- data.frame(cbind(all[1:end_trn,-1],Response))

#model building
train.task <- makeClassifTask(data = train.data[1:round(end_trn*train.fact),], target = "Response")
test.task <- makeClassifTask(data = train.data[(round(end_trn*train.fact)+1):end_trn,], target = "Response")

lrn = makeLearner("classif.xgboost")
lrn$par.vals = list(nrounds = 10,
        print.every.n = 5,
        objective = "multi:softmax",
        #num_class = 9,
        depth = 4,
        eta = 0.05,
        colsample_bytree = 0.66,
        min_child_weight = 4,
        subsample = 0.91)

model <- train(lrn, train.task)
pred <- predict(model, train.task)

while executing last command of the code I face the following error:

Error in data.frame(id = 1:47505, truth = c(8L, 4L, 8L, 8L, 8L, 8L, 8L,  : 
  arguments imply differing number of rows: 47505, 5938

I ran the same script for a simple case of iris data and it is running fine. What does the "arguments imply differing number of rows: 47505, 5938" means? training set has 47505 rows, what 5938 indicates? *The library used is 'mlr'

Thanks in advance,

  • I have had this, but I can't remember what it was... – Mike Wise Jan 04 '16 at 17:59
  • 1
    It would be great if you could supply a minimal reproducible example to go along with your question. Something we can work from and use to show you how it might be possible to answer your question. That way others can also befit form your question, and the accompanying answer, in the future. You can have a look at [this SO post](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) on how to make a great reproducible example in R. – Eric Fail Jan 04 '16 at 22:36
  • It would also be good if you could tell us where exactly the error is coming from. – Lars Kotthoff Jan 04 '16 at 22:38

1 Answers1

0

Some of your data may contain NA, and the predict function does not impute on NA values which are used by the learned classifier. In this case, the error message can be reformulated as: "5938 over 47505 rows are valid". Discard first data which contain NA.

Bentoy13
  • 4,886
  • 1
  • 20
  • 33