Referring to the post at How to use save model for prediction in python
when I load and predict with the new data..I am getting the following error.
is there anything we can do to resolve it?
UnicodeEncodeError: 'decimal' codec can't encode character u'\u2019' in position 510: invalid decimal Unicode string
my Entire code....
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.feature_extraction.text import TfidfTransformer
from sklearn.naive_bayes import MultinomialNB
from sklearn.svm import LinearSVC
from sklearn.pipeline import Pipeline
X_train, X_test, y_train, y_test = train_test_split(df['IssueDetails'], df['CRST'], random_state = 0)
count_vect = CountVectorizer()
X_train_counts = count_vect.fit_transform(X_train)
tfidf_transformer = TfidfTransformer()
X_train_tfidf = tfidf_transformer.fit_transform(X_train_counts)
clf = LinearSVC().fit(X_train_tfidf, y_train)
cif_svm = Pipeline([('tfidf', tfidf_transformer), ('SVC', clf)])
from sklearn.externals import joblib
joblib.dump(cif_svm, 'modelsvm.pk1')
Fitmodel = joblib.load('modelsvm.pk1')
Fitmodel.predict(df_v)