so I'm working on a project its for class "homework" if you will, but what it does is it takes in anime names and genres and if they are relevant or irrelevant I am trying to build a NaiveBayesClassifier with that and then I want to pass in genres and for it to tell me if it is relevant or irrelevant I currently have the following:
import nltk
trainingdata =[({'drama': True, 'mystery': True, 'horror': True, 'psychological': True}, 'relevant'), ({'drama': True, 'fantasy': True, 'romance': True, 'adventure': True, 'science fiction': True}, 'unrelevant')]
classifier = nltk.classify.naivebayes.NaiveBayesClassifier.train(trainingdata)
classifier.classify({'Fantasy': True, 'Comedy': True, 'Supernatural': True})
prob_dist = classifier.prob_classify(anime)
print "relevant " + str(prob_dist.prob("relevant"))
print "unrelevant " + str(prob_dist.prob("unrelevant"))
I currently have :
size of training array:110
the relevant length 57
the unrelevant length 53
Some results I receive :
relevant Tantei Opera Milky Holmes TD
input data passed to classify: {'Mystery': True, 'Comedy': True, 'Super': True, 'Power': True}
relevant 0.518018018018
unrelevant 0.481981981982
relevant Juuou Mujin no Fafnir
input data passed to classify :{'Romance': True, 'Fantasy': True, 'School': True}
relevant 0.518018018018
unrelevant 0.481981981982
So it looks like it's not reading my data correctly as 57/110 = .518018 But Im not sure what I am doing wrong...
I looked at this nltk NaiveBayesClassifier training for sentiment analysis
and i feel like I am doing it correctly.. The only thing I am not doing is specifying every specific key that isn't found in keys. Does that matter?
Thanks!