0

I have extracted keywords from the wikipedia article of movie Casino and generated feature set in which key:value pair is keyword:keyword . So , both feature name and feature value are same .i have labeled the feature set as "DRAMA" and "CRIME" making a list of tuple (featureset,label) .then I gave the labeled feature set as the trained input to Naive Bayes Classifier. After this , I try to classify the new feature set (example : {'roxy': 'roxy', 'sports': 'sports', 'wan': 'wan'} but it ignores the new feature set and does not return any label .

def feature_gen(wiki_dict, mt_movie):
    temp = [(wiki_dict, label.strip('\n'))  for label  in fileinput.input(mt_movie)]
    train(temp)

def train(train_sets):
    global classifier
    classifier = nltk.NaiveBayesClassifier.train(train_sets)

url = [ "http://en.wikipedia.org/wiki/Casino_(film)" ] 
mt_list = ['casino.txt'] 

classifier.classify( {'roxy': 'roxy', 'sports': 'sports', 'wan': 'wan'})
loopbackbee
  • 21,962
  • 10
  • 62
  • 97
  • can you post the full code and a sample of the input, given the code you've posted, nothing will run ;) – alvas Jan 17 '14 at 04:52
  • The idea of a text classifier is to annotate natural text, so the biggest problem with the code snippet is wrong input to `classifier.classify()`. see http://stackoverflow.com/questions/20827741/nltk-naivebayesclassifier-training-for-sentiment-analysis/20833372#20833372 for how to train and use the classifier in NLTK. – alvas Jan 17 '14 at 04:55

0 Answers0