0

I 've got a Flask application working in local environnement. But when I run it on production mode it does'nt work.

I'm using pickle to save my model, and I tested joblib to.

The problem occurs when I load the pickle file : I 've got a 504 timeout error. I'm loading the file like this, once the file is genereted by the training : model = pickle.load(open(file)),

I'm preaty sure it's the pickle file genereted by the training that throw this error (I tested with other pickle file)

After more investigation, I seam that injections maid by the pipeline function Pipeline cause the probleme :

model = Pipeline(
        [
            ('features', my_data),
            ('model', ensemble.RandomForestRegressor(min_samples_leaf=1, n_jobs=-1))
        ])
...
pickle.dump(model, file)

this work just fine :

model = Pipeline(
        [
            ('features', my_data),
            ('model', ensemble.RandomForestRegressor(min_samples_leaf=1, n_jobs=-1))
        ])
model = {}
model["foo"] = "bar"
pickle.dump(model, file)

I don't get any trouble with the Flask developement server, only in the production environement (apache), and of course I don't want to use the dev. server on my production env.

Any idea why the 504 error occure in the production environement ?

EDIT : It's method where I used pickle.load(...)

def recup_df():
    df = pd.read_pickle("dataframe.pickle")
    mod = pickle.load("model.pickle")
    X = df.head(20).drop(['price'], axis=1)
    y = df.head(20).price.values.copy()
    predict_df = pd.DataFrame.from_dict({
    'predicted':mod.predict(X),
    'true':y,
    'make':X.make,
    'model':X.model
    })
    prediction = dict()
    result = 1
    for data in predict_df.itertuples():
        str_result = "result n°{}".format(result)
        car_name = "{} {}".format(data.make, data.model)
        prediction[str_result] = {
        car_name : [{
        "true price":data.true,
        "predict price":data.predicted
        }]
        }
        result += 1
    output =  {
        "prediction":prediction
    }
    return jsonify(output)
ayguillo
  • 11
  • 2

1 Answers1

0

There is an issue with pickle.dump when it comes to Pipeline objects composed of different transformers.

Here is a previous post regarding the issue with relevant solutions: How to properly pickle sklearn pipeline when using custom transformer

I gave a try to cloudpickle and it worked with skleanr.Pipeline.

fpajot
  • 698
  • 3
  • 14
  • Hello, thank you for you answer. Yes,after the definition of the pipeline and hyper parameter definition, I use `model.fit(X, y)`. And only after this, I dump. I have tried with `protocol=2` but I have the same problem. – ayguillo Oct 08 '20 at 11:08
  • OK, could you provide us with the 504 error detail? – fpajot Oct 08 '20 at 11:52
  • I have no details but here are two messages in log files : `"GET /exemple HTTP/1.1" 504 247 "-" "Mozilla/5.0 (X11; Linux x86_64; rv:81.0) Gecko/20100101 Firefox/81.0"` in "access_log" and `[wsgi:error] [pid 6309] [client 10.101.1.59:23482] Timeout when reading response headers from daemon process 'apache': /var/www/webroot/ROOT/wsgi.py` in "error_log" ________________________________________________________ Je n'ai aucun détail de l'erreur. Dans deux fichiers de logs, j'ai seulement les erreurs que j'ai mis plus haut. – ayguillo Oct 08 '20 at 12:45
  • Thanks, it seems your Flask application is taking too much time to answer. So, how much time are the pickle load and the model.predict()? Where is the load in your script? Loading the model should be on top of your Flask script outside the GET related method. – fpajot Oct 09 '20 at 07:27
  • `@app.route('/exemple') def recup_df(): fd = open("model.pickle", "rb") mod = pickle.load(fd) fd.close()` This is my code. But I tried to test loading the model outside the fonction, and I have the same response. – ayguillo Oct 12 '20 at 07:08
  • Ok, I have edit my first post. I used this model for a car pricer. – ayguillo Oct 12 '20 at 07:35
  • OK, I edited my answer. As you said it is related to pickle.dump, and there are several solutions. – fpajot Oct 12 '20 at 07:53
  • Hello. unfortunately cloudpickle doesn't work. But for all solutions, it works in local. – ayguillo Oct 13 '20 at 07:52