0

I want to persist my ML model to my local machine. I have followed https://stackoverflow.com/a/29291153/11543494 this answer to store ML model in local machine but when I load persisted ML model from local machine I get Key Error.

I have created ML model which will predict category of URL. Now I want to integrate ML model with my web application and that is why I created API using Flask.

I have tested my ML model in jupyter notebook where I have my code related to ML model now, I just want to dump my ML model and use it in my API. In jupyter notebook I am getting proper output but when I load persisted file in my API code, I get KeyError. I tried using pickle, joblib but there I am getting MemoryError, I tried to resolve that also but I was unable to solve that issue so I am trying Klepto.

Klepto code

from klepto.archives import dir_archive

model = dir_archive('E:/Mayur/Sem 5/Python project/model_klepto',{'result':gs_clf},serialized=True, cached=False)
#gs_clf = gs_clf.fit(x_train, y_train) #RandomizedSearchCV
model.dump()

API code

import numpy as np
from flask import Flask, request, jsonify, render_template
from klepto.archives import dir_archive

app = Flask(__name__)

demo = dir_archive(
    'E:/Mayur/Sem 5/Python project/model_klepto', {}, serialized=True, cached=False)
demo.load()


@app.route('/')
def home():
    return render_template('index.html')


@app.route('/predict', methods=['POST'])
def predict():
    input = request.form.values()
    final_feature = [np.array(input)]
    prediction = demo['result'].predict([str(final_feature)])

    return render_template('index.html', prediction_text=prediction)


if __name__ == "__main__":
    app.run(debug=True)

When I run API I get KeyError:'result'.

If I run below code in jupyter notebook, I get correct output

demo = dir_archive(
    'E:/Mayur/Sem 5/Python project/model_klepto', {}, serialized=True, cached=False)
demo.load()
demo

Output>

ir_archive('model_klepto', {'result': RandomizedSearchCV(cv='warn', error_score='raise-deprecating',
                   estimator=Pipeline(memory=None,
                                      steps=[('vect',
                                              CountVectorizer(analyzer='word',
                                                              binary=False,
                                                              decode_error='strict',
                                                              dtype=<class 'numpy.int64'>,
                                                              encoding='utf-8',
                                                              input='content',
                                                              lowercase=True,
                                                              max_df=1.0,
                                                              max_features=None,
                                                              min_df=1,
                                                              ngram_range=(1,
                                                                           1),
                                                              preprocessor=None,
                                                              stop_words=None,
                                                              strip_accen...
                                                               sublinear_tf=False,
                                                               use_idf=True)),
                                             ('clf',
                                              MultinomialNB(alpha=1.0,
                                                            class_prior=None,
                                                            fit_prior=True))],
                                      verbose=False),
                   iid='warn', n_iter=5, n_jobs=None,
                   param_distributions={'clf__alpha': (0.01, 0.001),
                                        'tfidf__use_idf': (True, False),
                                        'vect__ngram_range': [(1, 1), (1, 2)]},
                   pre_dispatch='2*n_jobs', random_state=None, refit=True,
                   return_train_score=False, scoring=None, verbose=0)}, cached=False)
demo['result'].predict(['http://www.windows.com'])

Output> array(['Computers'], dtype=

Here is the screenshot of the stack trace Stack trace

  • It would help to (1) see your traceback, or at the very least the last several lines of it, and (2) have you distill your code to something minimal that is testable by others. It's possible that if you change the `keymap` to a one of the string or pickle variants, that could help. It's hard to say without seeing what the error indicates in the end of the traceback. – Mike McKerns Oct 23 '19 at 12:49
  • @MikeMcKerns Thank you for the response. I have added screenshot of complete stack trace. – Mayur Chawda Oct 24 '19 at 11:40

0 Answers0