2

I have this little chunk of code I found here:

import nltk.classify.util
from nltk.classify import NaiveBayesClassifier
from nltk.corpus import movie_reviews
from nltk.corpus import stopwords

def word_feats(words):
    return dict([(word, True) for word in words])

negids = movie_reviews.fileids('neg')
posids = movie_reviews.fileids('pos')

negfeats = [(word_feats(movie_reviews.words(fileids=[f])), 'neg') for f in negids]
posfeats = [(word_feats(movie_reviews.words(fileids=[f])), 'pos') for f in posids]

negcutoff = len(negfeats)*3/4
poscutoff = len(posfeats)*3/4

trainfeats = negfeats[:negcutoff] + posfeats[:poscutoff]
testfeats = negfeats[negcutoff:] + posfeats[poscutoff:]
print 'train on %d instances, test on %d instances' % (len(trainfeats), len(testfeats))

classifier = NaiveBayesClassifier.train(trainfeats)
print 'accuracy:', nltk.classify.util.accuracy(classifier, testfeats)
classifier.show_most_informative_features()

But how can I classify a random word that might be in the corpus.

classifier.classify('magnificent')

Doesn't work. Does it need some kind of object?

Thank you very much.

EDIT: Thanks to @unutbu's feedback and some digging here and reading the comments on the original post the following yields 'pos' or 'neg' for this code (this one's a 'pos')

print(classifier.classify(word_feats(['magnificent'])))

and this yields the evaluation of the word for 'pos' or 'neg'

print(classifier.prob_classify(word_feats(['magnificent'])).prob('neg'))
Kevin
  • 561
  • 1
  • 7
  • 20

1 Answers1

1
print(classifier.classify(word_feats(['magnificent'])))

yields

pos

The classifier.classify method does not operate on individual words per se, it classifies based on a dict of features. In this example, word_feats maps a sentence (a list of words) to a dict of features.

Here is another example (from the NLTK book) which uses the NaiveBayesClassifier. By comparing what is similar and different between that example, and the one you posted, you may get a better perspective of how it can be used.

unutbu
  • 842,883
  • 184
  • 1,785
  • 1,677