Tensorflow: How to reduce memory footprint for inference only models?

Question

During inference, we don't need to keep the activations from the previous layers as we propogate through the network. However, since we are not explicitly telling the program to discard them, it does not differentiate between training and inference passes. Is there a way -perhaps an easy flag,class,method- to do this kind of memory management in Tensorflow? Would simply using tf.stop_gradient work?

This is done automatically. TensorFlow creates new execution plan for each .run call, so if you don't request gradients, those activations will be discarded — Yaroslav Bulatov, Jul 13 '17 at 16:10

P-Gn · Accepted Answer · 2017-07-20T14:57:37.410

3

The easiest way is to "freeze" (tensorflow's terminology) your model using their freeze_graph.py script.

This script basically removes all unnecessary operations, and also replace all variables with constants, then export back the resulting graph on disk.

For this, you need to specify in your graph which are the outputs that you use during inference. Nodes that cannot reach the outputs (likely summaries, losses, gradients and the likes) are automatically discarded.

Once backward passes are eliminated, tensorflow can optimize its memory usage and in particular automatically free or reuse memory taken by unused nodes.

edited Jul 20 '17 at 14:57

answered Jul 13 '17 at 08:41

P-Gn

23,115
9
87
104

Strangely enough, I get exactly opposite effect in terms of RAM use: when I `freeze_graph` the memory consumption on inference increases 1.5-2 times. – Aleksei Petrenko Oct 02 '17 at 12:09

Tensorflow: How to reduce memory footprint for inference only models?

1 Answers1