6

I am trying to serve a machine learning model via an API using Flask's Blueprints, here is my flask __init__.py file

from flask import Flask

def create_app(test_config=None):
    app = Flask(__name__)

    @app.route("/healthcheck")
    def healthcheck() -> str:
        return "OK"

    # Registers the machine learning blueprint
    from . import ml
    app.register_blueprint(ml.bp)

    return app

The ml.pyfile which contains the blueprint for the /ml endpoint

import numpy as np
from . import configuration as cfg
import tensorflow as tf

from flask import (
    Blueprint, flash, request, url_for
)


bp = Blueprint("ml", __name__, url_prefix="/ml")
keras_model = None
graph = None

@bp.before_app_first_request
def load_model():
    print("Loading keras model")
    global keras_model
    global graph
    with open(cfg.config["model"]["path"], 'r') as model_file:
        yaml_model = model_file.read()
        keras_model = tf.keras.models.model_from_yaml(yaml_model)
        graph = tf.get_default_graph()
        keras_model.load_weights(cfg.config["model"]["weights"])

@bp.route('/predict', methods=['POST'])
def predict() -> str:
    global graph
    features = np.array([request.get_json()['features']])
    print(features, len(features), features.shape)
    with graph.as_default():
        prediction = keras_model.predict(features)
    print(prediction)
    return "%.2f" % prediction

I run the server using a command line script

#!/bin/bash 

export FLASK_APP=src
export FLASK_ENV=development
flask run

And if I go to localhost:5000/healthcheckI get the OK response as I should, when I run the following curl

curl -X POST \
  http://localhost:5000/ml/predict \
  -H 'Cache-Control: no-cache' \
  -H 'Content-Type: application/json' \
  -d '{
 "features" : [17.0, 0, 0, 12.0, 1, 0, 0]
}'

For the first time, I get the response [[1.00]], if I run it again I get the following error

tensorflow.python.framework.errors_impl.FailedPreconditionError: 
Error while reading resource variable dense/kernel from
Container: localhost. This could mean that the variable was uninitialized. 
Not found: Container localhost does not exist. (Could not find resource: localhost/dense/kernel)
         [[{{node dense/MatMul/ReadVariableOp}}]]

If I modify the Blueprint file the server will detect the changes and refresh it, I can call the API again and it will return the correct result for the first call and I am back to the error again. Why does this happen? And why only for the calls after the first one?

Victor Capone
  • 191
  • 1
  • 6
  • Does it survive if you directly run the relevant parts (so initialize things, and run the internals of `predict()` twice, with a hardcoded `features = np.array([[17.0,0,0,12.0,1,0,0]])`)? – tevemadar Feb 19 '19 at 19:05
  • No, I get the same error, I don't get the "run the internals of `predict()` though, it is a class function for the Keras model, I don't have access to it", still, I tried to run with the hardcoded value and it failed all the same – Victor Capone Feb 19 '19 at 19:14
  • My question in other words: does it work without the web context? Practically if you make a copy of `ml.py`, remove `flask` references and annotations, put a hardcoded value into `features`, and run `load_model()` once, and `predict()` (with the hardcoded value) twice, would it work properly? – tevemadar Feb 20 '19 at 10:01
  • 1
    Yep, it does work, I am suspecting it has something to do with flask's "threading" – Victor Capone Feb 21 '19 at 12:14
  • https://stackoverflow.com/questions/19277280/preserving-global-state-in-a-flask-application might be interesting for you. Both `flask.g` and the remark on thread-safety. Unless the usage of your model is strictly read-only, you have to worry about possible simultaneous requests too. – tevemadar Feb 21 '19 at 13:43
  • Interesting, this seems to imply that I should load my model with every request (which I tested using @bp.before_app_request and it works), also, there is a comment in that answer that says that as of Flask 0.12 flask.g stores stuff in the application context rather than request context, is this the right way to do this then? Store my model in the app (or blueprint) context when the app is initialized? – Victor Capone Feb 21 '19 at 14:17
  • As far as I understand, `.predict()` does not store/alter anything, thus a one-time initialization is enough, and `before_app_first_request` or `before_first_request` could be fine. – tevemadar Feb 21 '19 at 14:30
  • Initializing the model using before_app_first_request causes the same error as before, although initializing it every time with before_app_request works fine – Victor Capone Feb 21 '19 at 16:13

1 Answers1

2

You can try creating a reference to the session that is used for loading the models and then to set it to be used by keras in each request. i.e. do the following:

from tensorflow.python.keras.backend import set_session
from tensorflow.python.keras.models import load_model

tf_config = some_custom_config
sess = tf.Session(config=tf_config)
graph = tf.get_default_graph()

# IMPORTANT: models have to be loaded AFTER SETTING THE SESSION for keras! 
# Otherwise, their weights will be unavailable in the threads after the session there has been set
set_session(sess)
model = load_model(...)

and then in each request:

global sess
global graph
with graph.as_default():
    set_session(sess)
    model.predict(...)
DesiKeki
  • 656
  • 8
  • 9