Why is my .pickle saved classifier giving different results than training and classifying in one program

Question

I have seen this problem on stack before but the solutions didnt work for me, see Save and Load testing classify Naive Bayes Classifier in NLTK in another method. I am baffled as to why my accuracy is far different when I load the pickled classifier as opposed to just training and classifying in the same program. The first code block is calling the pickled classifier, the second is doing all the training and classifying together. The second method gives an accuracy of 99% while the first gives 81%...

Academic_classifier= None
Academic_classifier=pickle.load(open('Academic_classifier.pickle','rb'))
tweets=[]

readdata=csv.reader(open('C:\Users\Troy\Documents\Data\Gold_test.csv','r'))
for row in readdata:
    tweets.append(row)
Header = tweets[0]
tweets.pop(0)
Academic_test_tweets=tweets[:]
Tweets=[]
for (words, sentiment) in tweets:
    bigram=[]
    bigram_list=[]
    words_filtered = [e.lower() for e in WordPunctTokenizer().tokenize(words) if len(e) >= 3]
    words_filtered=[re.sub(r'(.)\1+', r'\1\1', e) for e in words_filtered if len(e)>=3]
    bigram_words=bigrams(words_filtered)
    for x in bigram_words:
        bigram.append(x)
    for (bi) in bigram:
        bigram_word=bi[0]+bi[1]
        bigram_list.append(bigram_word)
    list_to_append=words_filtered+bigram_list
    Tweets.append((list_to_append, sentiment))
Academic_test_tweets_words=Tweets[:]


word_features = get_word_features(get_words_in_tweets(Academic_test_tweets_words))
Academic_test_set = nltk.classify.apply_features(extract_features,Academic_test_tweets_words)

print(nltk.classify.accuracy(Academic_classifier, Academic_test_set), 'tweet corpus used in academic paper Sentiment Analysis on the Social Networks Using Stream Algorithms Authors: Nathan Aston, Timothy Munson, Jacob Liddle, Garrett Hartshaw, Dane Livingston, Wei Hu   *compare to their accuracy of 87.5%')

As opposed to this code where I train and test the accuracy. I use the same definitions for everything so I know the problem isn't with the definitions. The only difference is the pickled classifier... what is happening?

tweets=[]

readdata=csv.reader(open('C:\Users\Troy\Documents\Data\Gold_test.csv','r'))
for row in readdata:
    tweets.append(row)
Header = tweets[0]
tweets.pop(0)
Academic_test_tweets=tweets[:]
Tweets=[]
for (words, sentiment) in tweets:
    bigram=[]
    bigram_list=[]
    words_filtered = [e.lower() for e in WordPunctTokenizer().tokenize(words) if len(e) >= 3]
    words_filtered=[re.sub(r'(.)\1+', r'\1\1', e) for e in words_filtered if len(e)>=3]
    bigram_words=bigrams(words_filtered)
    for x in bigram_words:
        bigram.append(x)
    for (bi) in bigram:
        bigram_word=bi[0]+bi[1]
        bigram_list.append(bigram_word)
    list_to_append=words_filtered+bigram_list
    Tweets.append((list_to_append, sentiment))
Academic_test_tweets_words=Tweets[:]


word_features = get_word_features(get_words_in_tweets(Academic_test_tweets_words))
Academic_test_set = nltk.classify.apply_features(extract_features,Academic_test_tweets_words)





tweets=[]
readdata=csv.reader(open('C:\Users\Troy\Documents\Data\Gold_train.csv','r'))
for row in readdata:
    tweets.append(row)
Header = tweets[0]
tweets.pop(0)
AcademicTweets=tweets[:]
Tweets=[]
for (words, sentiment) in tweets:
    bigram=[]
    bigram_list=[]
    words_filtered = [e.lower() for e in WordPunctTokenizer().tokenize(words) if len(e) >= 3]
    words_filtered=[re.sub(r'(.)\1+', r'\1\1', e) for e in words_filtered if len(e)>=3]
    bigram_words=bigrams(words_filtered)
    for x in bigram_words:
        bigram.append(x)
    for (bi) in bigram:
        bigram_word=bi[0]+bi[1]
        bigram_list.append(bigram_word)
    list_to_append=words_filtered+bigram_list
    Tweets.append((list_to_append, sentiment))
AcademicWords=Tweets[:]

word_features = get_word_features(get_words_in_tweets(AcademicWords))
Academic_training_set = nltk.classify.apply_features(extract_features,AcademicWords)
Academic_classifier = nltk.NaiveBayesClassifier.train(Academic_training_set)
#Negative_classifier.show_most_informative_features(1)
print(nltk.classify.accuracy(Academic_classifier, Academic_test_set), 'tweet corpus used in academic paper Sentiment Analysis on the Social Networks Using Stream Algorithms Authors: Nathan Aston, Timothy Munson, Jacob Liddle, Garrett Hartshaw, Dane Livingston, Wei Hu   *compare to their accuracy of 87.5%')


pickle.dump(Academic_classifier, open('Academic_classifier.pickle','wb'))

Okay I ran in one program instance by adding `Academic_classifier=None Academic_classifier=pickle.load(open('Academic_classifier.pickle','rb')) print(nltk.classify.accuracy(Academic_classifier, Academic_test_set))` and I get 99.1% accuracy both times... I don't see the difference in my code do you? — lrrr, Jul 28 '15 at 12:41
Sorry -- line-by-line comparison is the kind of work that should be done *before* asking a StackOverflow question; I don't have the time/effort to spend on it here. I'd suggest saving the versions in two separate files and using vimdiff or similar. — Charles Duffy, Jul 28 '15 at 14:18
I understand, and appreciate your advise. It turns out that the accuracy depends on the order of manipulating the training and test data so there must be a global variable defined that gets carried or redefined with the training data affecting the accuracy. The true accuracy is 81% but I also found it classifies all of the tweets as the same sentiment every time, even with other data sets... I'm using a 1 vs all setup to analyse 4 sentiments and it always labels them as not a certain sentiment... leaving me all tweets labeled as not positive, not negative, not angry and not excited. — lrrr, Jul 28 '15 at 14:46
Hello are you still having problem with this? After a while having maybe the same issue with RandomForestRegressor, I've figured that it was because the order of the features from my saved model was differing from my test model. Let's say my model have columns `A, B, C`. My trained/saved model uses them in that same order, but I also save a pickle/joblib for which columns are important with `save=list(set(df.columns))` that might reorder them to `B, A, C`. So when I load the model and columns `(df[save])`, the prediction will have weird values as the order expected by my saved model. — mrbTT, Nov 14 '18 at 15:35

Why is my .pickle saved classifier giving different results than training and classifying in one program

0 Answers0