My model performs 100 times of inference and one time of training alternately, so I want the inference to be as quick as possible. I see many webpages that discuss how to freeze a saved graph in Tensorflow, but I'd like to do a quick inference without saving the weights to a file... just like PyTorch, to which I'm more accustomed. Saving the weights to a file and freezing the graph in the file takes a while, and I want to avoid that. In PyTorch, using volatile=True made inference twice faster, so I expect the same speedup in Tensorflow by freezing a graph.
So, could anyone tell me what the volatile and requires_grad counterparts in Tensorflow are? If they don't exist, what would be the recommended way to achieve my objective? Does using tf.stop_gradient or tf.estimator.ModeKeys.PREDICT solve my issue?