I am running a service where I run inference on a stateful LSTM using Keras. However I am wondering what the threading semantics are here. I am not asking how to store models per flask session, I am more interested in what is happening under the hood in Keras when I run my stateful models, for example is making a new model per thread overkill? Does Keras handle states per thread automatically?
This question is different from the stated duplicate because the duplicate deals explicitly with storing objects per flask session. This question deals with how Keras deals with stateful models between threads.
My code for inference is like so:
Loading:
MODEL = model_from_json(
open(f"{ROOT_DIR}/../../bias-model/bias-model.json", "r").read()
)
MODEL.load_weights(f"{ROOT_DIR}/../../bias-model/bias-model.h5")
Inference:
for i in range(batch_input.shape[0]):
prediction = MODEL.predict_on_batch(batch_input[i])[-1][0]
MODEL.reset_states()
Because I reset states on the model, does this mean I should create a new model per thread, or perhaps perform a lock when I run the prediction on the bias model that is created globally, or maybe there is some other mechanism that I am missing?
I should add that I running TF 2.3.1 on Keras 2.4.3. Often when I research solutions they are not compatible with these versions.