I'm trying to run Naive Bayes for classification on few texts in R. When predicting for a testing set (8 texts) together, I'm getting the following posterior probabilities : (notice the first text's probability values)
using library(e1071)
and library(tm)
. My subset of the training data looks like this
subset<-dput(head(sample_data,8))structure(list(Text=c("Kretz, the former CEO of Hanover Corporation, previously pleaded guilty to money laundering, and conspiracy to commit securities fraud, wire fraud and mail fraud.","Tard¢n was the head of an international narcotics trafficking and money laundering syndicate which distributed over 7,500 kilograms of South American cocaine","Ellison previously pleaded guilty to charges of health care fraud and money laundering","Paris was under attack on Friday by ISIS","A fraud of 20 million Euros have been booked against him for financing terrorist activities","Black money has been a common issue with Indian progress","Corruption charges has been placed against the NGO","An enquiry commission has been put in place regarding the recent uproar "),Category=structure(c(1L,1L,1L,1L,1L,1L,1L,1L),.Label=c("Money laundering","Money laundering","Money laundering","Terrorist Financing","Terrorist Financing","Bribery and Corruption","Bribery and Corruption","Bribery and Corruption"),class="factors"),.Names=c("Text","Category"),class="data.frame"))
Since the training set is quite small I used the following code to prepare my training data for modelling
traindata <- as.data.frame(rbind(as.matrix(subset[1:8, c(1,2)])),as.matrix(subset[1:8,c(1,2)]))
testdata<-structure(list(Text=c("he is in jail for corruption charges","he cheat","he is involved in a racket","this is a violation of the law","this bank is fraud","gaming dupe us","he committed fraud","bank is involved in forgery"),Category=c("","","","","","","","")),.Names=c("Text","Category"),class="data.frame")
After preparing the corpus,removing stopwords,creating the training and testing matrix I pass the following command
model <- naiveBayes(as.matrix(trainmatrix),as.factor(traindata$Category));
results<-predict(model, as.matrix(testmatrix), type="raw")
and get the following result
Posterior probability values for 8 texts in test set
but when I'm passing only one text (in this case, first among the 8), the posterior probability changes to : (first text is showing different probability values)
Posterior probability value changes for the first text from test set
I fail to understand how this is happening as the training data remains the same and nothing is getting changed in the code. Can someone please help me?