-1

Referring to the post at How to use save model for prediction in python

when I load and predict with the new data..I am getting the following error.

is there anything we can do to resolve it?

UnicodeEncodeError: 'decimal' codec can't encode character u'\u2019' in position 510: invalid decimal Unicode string

my Entire code....

from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.feature_extraction.text import TfidfTransformer
from sklearn.naive_bayes import MultinomialNB
from sklearn.svm import LinearSVC
from sklearn.pipeline import Pipeline
X_train, X_test, y_train, y_test = train_test_split(df['IssueDetails'], df['CRST'], random_state = 0)
count_vect = CountVectorizer()
X_train_counts = count_vect.fit_transform(X_train)
tfidf_transformer = TfidfTransformer()
X_train_tfidf = tfidf_transformer.fit_transform(X_train_counts)
clf = LinearSVC().fit(X_train_tfidf, y_train)
cif_svm = Pipeline([('tfidf', tfidf_transformer), ('SVC', clf)])

from sklearn.externals import joblib
joblib.dump(cif_svm, 'modelsvm.pk1')

Fitmodel = joblib.load('modelsvm.pk1')
Fitmodel.predict(df_v)
mkpisk
  • 152
  • 1
  • 9

1 Answers1

0

I found the answer for my question above. I used the below code for prediction

datad['CRSTS']=datad['Detail'].apply(lambda x: unicode(clf.predict(count_vect.transform([x]))))
mkpisk
  • 152
  • 1
  • 9