1

Total beginner with R here and I know this error has been addressed before but the answers available do not solve my problem.

I'm trying to apply a naive bayes classifier on a test dataset but with fewer variables (columns) than the training dataset used for creating the classifier. In other words, I created the classifier to predict membership of customers in certain segments based on certain variables (8 independent variables), and it worked fine on the test dataset (identical to the training data in terms of variables), but now I want to test how the model will perform if the data I have does not include all the variables from the training data (for example, if I only have the demographics of customers. So what I did is choose certain variables (columns) from the test data like this:

data.test2 <- data.test[,c(1,2,5,6,8)] 

the test data as such includes only five independent variables out of the original 8 in the training set. I also took out the response variable (column 9)

However I get this error:

Error in `[[<-.data.frame`(`*tmp*`, i, value = integer(0)) :
  replacement has 0 rows, data has 207

I made sure the variables names are exactly the same as in the training data. My understanding from the package is that this should not be a problem:

"New Data: A dataframe with new predictors (with possibly fewer columns than the training data). Note that the column names of newdata are matched against the training data ones."

Any ideas? Thank you!

duckmayr
  • 16,303
  • 3
  • 35
  • 53
Samsel
  • 11
  • 2
  • Welcome to Stack Overflow! You may want to check out [How to make a great R reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). In particular, editing your question to include (at least some of) your data by using the output of `dput()` and the minimum amount of your code needed to reproduce the error you got (including any relevant calls to `library()`) would be helpful. – duckmayr Nov 26 '18 at 22:35

0 Answers0