0

I have been attempting to analyze a dataset (about 7000 entries) for twitter sentiment analysis. I've been trying to use a Naive Bayes model, in order to predict whether a tweet is negative or not. Confusion matrix has no prediction, just the base rate, meaning the model isn't making any predictions. How can I have it to make predictions? Maybe the removeSparseTerms parameter would need to change. If Bayes can't predict anything, what other models would be good to use for this dataset?

tweets$Negative = as.factor(tweets$Sentiment <= -1)
# Create corpus, Convert to lower-case, remove punctuation, remove stopwords, 
# stem document, create frequency matrix

sparse = removeSparseTerms(frequencies, 0.995)
tweetsSparse = as.data.frame(as.matrix(sparse))
tweetsSparse$Negative = tweets$Negative

split = sample.split(tweetsSparse$Negative, SplitRatio = 0.7)
trainSparse = subset(tweetsSparse, split==TRUE)
testSparse = subset(tweetsSparse, split==FALSE)

prepare_testData <- function(model.training.data, test.dtm){
  # Create an empty dataframe with column names same as features in training data
  train.features <- names(model.training.data)
  testData <- matrix(data = rep(0, length(train.features) * nrow(test.dtm)), 
                     nrow = nrow(test.dtm), ncol = length(train.features))
  colnames(testData) <- train.features
  row.names(testData) <- row.names(test.dtm)

  # features common to both train and test are copied from test data
  common.features <- intersect(train.features, names(test.dtm))
  for(i in 1:length(common.features)) {
    testData[,common.features[i]] <- test.dtm[,common.features[i]]
  }
  testData <- as.data.frame(testData)
  return(testData)
}

########### Naive Bayes model training ###########
naive.bayes.model <- train(Negative ~., 
                           data = trainSparse, 
                           trControl = trainControl(method = "cv", number = 5),
                           method = "nb")
naive.bayes.testData <- prepare_testData(naive.bayes.model$trainingData[, -ncol(naive.bayes.model$trainingData)],
                                         testSparse)
naive.bayes.pred <- predict(naive.bayes.model, naive.bayes.testData)
naive.bayes.metrics <- confusionMatrix(naive.bayes.pred, testSparse$Negative)
gamelanguage
  • 103
  • 10
  • 2
    You need to make a minimal [reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). That should include sample input data so we can run the code to re-create the problem. Also remove any lines of code not directly related to your problem. – MrFlick Apr 03 '16 at 15:50
  • I tried a few other models that were able to get a prediction, specifically SVM and Random Forest. Nothing I could do could get a prediction out of Naive Bayes. I don't understand the mechanics behind Naive Bayes, but maybe it was just not very applicable to this dataset. Sorry I can't be more of help. – gamelanguage Apr 03 '16 at 20:55

0 Answers0