I have a PPO policy based model that I train with RLLib using the Ray Tune API on some standard gym environments (with no fancy preprocessing). I have model checkpoints saved which I can load from and restore for further training.
Now, I want to export my model for production onto a system that should ideally have no dependencies on Ray or RLLib. Is there a simple way to do this?
I know that there is an interface export_model
in the rllib.policy.tf_policy
class, but it doesn't seem particularly easy to use. For instance, after calling export_model('savedir')
in my training script, and in another context loading via model = tf.saved_model.load('savedir')
, the resulting model
object is troublesome (something like model.signatures['serving_default'](gym_observation)
doesn't work) to feed the correct inputs into for evaluation. I'm ideally looking for a method that would allow for easy out of the box model loading and evaluation on observation objects