4

I am getting an error while running naive bayes classifier in R. I am using the following code-

mod1 <- naiveBayes(factor(X20) ~ factor(X1) + factor(X2) +factor(X3) +factor(X4)+factor(X5)+factor(X6)+factor(X7)
               +factor(X8)+factor(X9)
               +factor(X10)+factor(X11)+ factor(X12)+factor(X13)+factor(X14)
               +factor(X15)
               +factor(X16)+factor(X17)
               +factor(X18)+factor(X19),data=intent.test)

res1 <- predict(mod1)$posterior

First part of this code runs fine. But when it try to predict the posterior probability it throws following error-

**Error in as.data.frame(newdata) : 
argument "newdata" is missing, with no default**

I tried running something like

res1 <- predict(mod1,new_data=intent.test)$posterior

but this also gives the same error.

Paul Hiemstra
  • 59,984
  • 12
  • 142
  • 149
SumitGupta
  • 105
  • 1
  • 3
  • 8
  • 3
    The correct spelling is `newdata`, with no underscore (as in the error message), but it is an optional parameter: it should work without it. There may be something untowards with your dataset, but you do not give any information about it. Having the data already encoded as factors, in the data.frame may help. If you are trying to predict the last column with the other ones, the model can be written, more compactly, as `X20 ~ .`. – Vincent Zoonekynd Feb 06 '12 at 09:07

1 Answers1

9

You seem to be using the e1071::naiveBayes algorithm, which expects a newdata argument for prediction, hence the two errors raised when running your code. (You can check the source code of the predict.naiveBayes function on CRAN; the second line in the code is expecting a newdata, as newdata <- as.data.frame(newdata).) Also as pointed out by @Vincent, you're better off converting your variables to factor before calling the NB algorithm, although this has certainly nothing to do with the above errors.

Using NaiveBayes from the klar package, no such problem would happen. E.g.,

data(spam, package="ElemStatLearn")
library(klaR)

# set up a training sample
train.ind <- sample(1:nrow(spam), ceiling(nrow(spam)*2/3), replace=FALSE)

# apply NB classifier
nb.res <- NaiveBayes(spam ~ ., data=spam[train.ind,])

# predict on holdout units
nb.pred <- predict(nb.res, spam[-train.ind,])

# but this also works on the training sample, i.e. without using a `newdata`
head(predict(nb.res))
chl
  • 27,771
  • 5
  • 51
  • 71