I have a Python script that successfully creates, trains and pickles a Naive Bayes classifier for string sentiment analysis. I've adapted code snippets found here and here, which have been great for an informed beginner like myself. However both resources stop short of showing how to use a pickled classifier. Previous StackOverflow answers (here and here) hint at the fact that both the classifier object itself AND the feature vector should be saved to disk and then loaded together for use later, but there's no included syntax for how that ought to be achieved.
EDITS: this code works to train and store the classifier:
...
def get_words_in_descs(descs):
all_words = []
for (words, sentiment) in descs:
all_words.extend(words)
return all_words
def get_word_features(wordlist):
wordlist = nltk.FreqDist(wordlist)
word_features = wordlist.keys()
return word_features
def extract_features(document):
document_words = set(document)
features = {}
for word in word_features:
features['contains(%s)' % word] = (word in document_words)
return features
training = [
(['Multipurpose 4140 alloy steel'], 'metal'),
(['Easy-to-machine polyethylene tube'], 'plastic'),
...
]
word_features = get_word_features(get_words_in_descs(training))
training_set = nltk.classify.apply_features(extract_features, training)
classifier = nltk.NaiveBayesClassifier.train(training_set)
outputFile = open('maxModel.pkl','wb')
pickle.dump(classifier, outputFile)
outputFile.close()
EDITS: Again, the code above works great. My issue is a separate .py file, where I try to unpickle this classifier and then use it to classify a new, previously-unseen string. I thought originally that was because I was taking the classifier away from the word_features
, but maybe something else is wrong?
Here is the code that is not working. I now get this error... is it expecting a list someplace?
'dict_keys' object has no attribute 'copy'
...
def get_word_features(wordlist):
wordlist = nltk.FreqDist(wordlist)
word_features = wordlist.keys()
return word_features
with open('maxModelClassifier.pkl', 'rb') as fid:
loaded_classifier = pickle.load(fid)
#print(str(loaded_classifier.show_most_informative_features(100)))
#try to use the loaded_classifier:
print(loaded_classifier.classify(get_word_features(['super-cushioning', 'foam', 'sheet', 'adhesive-back', 'polyurethane'])))
Thanks for any insights.