1

I am trying to use predict.boosting for new data in adabag package. I can't find a way to use it for data without labels (or any other function from that package).

I am trying:

pr <- predict.boosting(modelfit, test[,2:ncol(test)])

It gives:

Error in `[.data.frame`(newdata, , as.character(object$formula[[2]])) : 
  undefined columns selected

However, if I include labels:

pr <- predict.boosting(modelfit, test)

it works just fine. But there has to be a way to use it as a predictive model for data without labels.

Thanks for any help!

EDIT Example from package:

library(rusboost)
library(rpart)
data(iris)

make it an unbalanced dataset by removing most of the setosa observations

df <- iris[41:150,]

create binary variable

df$Setosa <- factor(ifelse(df$Species == "setosa", "setosa", "notsetosa"))

create index of negative examples

idx <- df$Setosa == "notsetosa"

run model

test.rusboost <- rusb(Setosa ~ Sepal.Length + Sepal.Width + Petal.Length + Petal.Width,
                      data = df, boot = F, iters = 20, sampleFraction = .1, idx = idx)

predict.boosting(test.rusboost, df)
predict.boosting(test.rusboost, df[,1:4)
aqua
  • 220
  • 1
  • 12
  • It would be easier to help you if you provided a [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) – MrFlick Aug 21 '17 at 15:02
  • Thanks for commenting,I added one. – aqua Aug 22 '17 at 07:33

2 Answers2

1

You should control that all your columns in train (the set you used to train the model) are present in test an with the same name.

Please check:

all(colnames(train) %in% colnames(test))

If it's false, you will need to control how you built train and test.

If it's TRUE, and in general, please provide a reproductible example.

Edit:

A nice way to control that columns are the same, and they contain the same factors is to use sameShape from dataPreparation package. If it's not the cas, it will add levels and columns (and warn you).

To use it:

library(dataPreparation)
test <- sameShape(test, train)
Emmanuel-Lin
  • 1,848
  • 1
  • 16
  • 31
  • Thanks for answering. I checked, it gives TRUE. It only produces error when I omit the labels. I gave an example above. – aqua Aug 21 '17 at 16:05
  • Before applying your model, you can use the function `sameShape` from dataPreparation package. It will control that you have the same columns dans the same levels in them. – Emmanuel-Lin Aug 21 '17 at 18:34
0

I came up with a workaround, I attached a column with the same name as the labels to my newdata and filled it with random factor levels.

df$Setosa <- factor(sample( c("setosa",  "notsetosa"), nrow(df), replace=TRUE, prob=c(0.5, 0.5) ))

Then it works just fine.

aqua
  • 220
  • 1
  • 12