I am following the post in stackoverflow here on how to save a classifier. When I try doing the way mentioned in the second post. But I keep getting
ValueError: Vocabulary wasn't fitted or is empty!
My training code is as follows:
train = load_files(learning_data_train)
count_vect = CountVectorizer(tokenizer=tokenize,stop_words='english')
X_train_counts = count_vect.fit_transform(train.data)
clf = SGDClassifier(loss='hinge', penalty='l1',alpha=1e-3, n_iter=5).fit(X_train_counts, train.target)
filename = "SGD.pk1"
joblib.dump(clf, filename)
And my testing code is as follows:
count_vect = CountVectorizer(tokenizer=tokenize,stop_words='english')
filename = "SGD.pk1"
clf = joblib.load(filename)
print clf
file= "testfolder/"
docs_new = []
for i in os.listdir(file):
docs_new.append(open(file+i,"r").read())
X_new_counts = count_vect.transform(docs_new)
predicted = clf.predict(X_new_counts)
for doc, category in zip(docs_new, predicted):
print(' => %s' % ( train.target_names[category]))
The error is thrown when executing
X_new_counts = count_vect.transform(docs_new)
is there something I am doing wrong here?