22

TensorFlow always (pre-)allocates all free memory (VRAM) on my graphics card, which is ok since I want my simulations to run as fast as possible on my workstation.

However, I would like to log how much memory (in sum) TensorFlow really uses. Additionally it would be really nice, if I could also log how much memory single tensors use.

This information is important to measure and compare the memory size that different ML/AI architectures need.

Any tips?

talonmies
  • 70,661
  • 34
  • 192
  • 269
daniel451
  • 10,626
  • 19
  • 67
  • 125

2 Answers2

23

Update, can use TensorFlow ops to query allocator:

# maximum across all sessions and .run calls so far
sess.run(tf.contrib.memory_stats.MaxBytesInUse())
# current usage
sess.run(tf.contrib.memory_stats.BytesInUse())

Also you can get detailed information about session.run call including all memory being allocations during run call by looking at RunMetadata. IE something like this

run_metadata = tf.RunMetadata()
sess.run(c, options=tf.RunOptions(trace_level=tf.RunOptions.FULL_TRACE, output_partition_graphs=True), run_metadata=run_metadata)

Here's an end-to-end example -- take column vector, row vector and add them to get a matrix of additions:

import tensorflow as tf

no_opt = tf.OptimizerOptions(opt_level=tf.OptimizerOptions.L0,
                             do_common_subexpression_elimination=False,
                             do_function_inlining=False,
                             do_constant_folding=False)
config = tf.ConfigProto(graph_options=tf.GraphOptions(optimizer_options=no_opt),
                        log_device_placement=True, allow_soft_placement=False,
                        device_count={"CPU": 3},
                        inter_op_parallelism_threads=3,
                        intra_op_parallelism_threads=1)
sess = tf.Session(config=config)

with tf.device("cpu:0"):
    a = tf.ones((13, 1))
with tf.device("cpu:1"):
    b = tf.ones((1, 13))
with tf.device("cpu:2"):
    c = a+b

sess = tf.Session(config=config)
run_metadata = tf.RunMetadata()
sess.run(c, options=tf.RunOptions(trace_level=tf.RunOptions.FULL_TRACE, output_partition_graphs=True), run_metadata=run_metadata)
with open("/tmp/run2.txt", "w") as out:
  out.write(str(run_metadata))

If you open run.txt you'll see messages like this:

  node_name: "ones"

      allocation_description {
        requested_bytes: 52
        allocator_name: "cpu"
        ptr: 4322108320
      }
  ....

  node_name: "ones_1"

      allocation_description {
        requested_bytes: 52
        allocator_name: "cpu"
        ptr: 4322092992
      }
  ...
  node_name: "add"
      allocation_description {
        requested_bytes: 676
        allocator_name: "cpu"
        ptr: 4492163840

So here you can see that a and b allocated 52 bytes each (13*4), and the result allocated 676 bytes.

Yaroslav Bulatov
  • 57,332
  • 22
  • 139
  • 197
  • 2
    Is there a convenient way to capture an entire device's allocation? By this I mean what proportion of my device's free memory has been allocated? – Aidan Gomez Jan 04 '17 at 15:16
  • not that I know of. I've been just adding up all the allocation messages and subtracting deallocation messages from the verbose logs. Could be a good feature request if you have a use case – Yaroslav Bulatov Jan 04 '17 at 17:32
  • 2
    BTW, in C++ API there's [this call](https://github.com/tensorflow/tensorflow/blob/64edd34ce69b4a8033af5d217cb8894105297d8a/tensorflow/core/kernels/stack_ops.cc#L223) which allows to see total memory allocated. It looks like it's not wrapped to be accessible from Python yet, that would be a good feature addition – Yaroslav Bulatov Jan 21 '17 at 04:10
  • 2
    btw, I just wrapped that C++ call into an op that you can call from Python, here are usage instructions -- https://github.com/yaroslavvb/memory_probe_ops – Yaroslav Bulatov Jan 26 '17 at 05:42
  • 2
    @YaroslavBulatov You should update this answer. Your `memory_probe_ops` is now in `tf.contrib`, and it's a really simple way to get the memory usage, i.e. via `tf.contrib.memory_stats.MaxBytesInUse()`. – Albert Sep 29 '17 at 10:34
  • @Albert btw MaxBytesInUse gives maximum across all sessions and all runs so far. I'm adding BytesInUse for more precise stuff -- https://github.com/tensorflow/tensorflow/pull/13107 – Yaroslav Bulatov Sep 29 '17 at 15:08
  • @YaroslavBulatov `MaxBytesInUse()` shows me 1280 but `nvidia-smi` 3495MiB / 11439MiB which is the expected since I use `per_process_gpu_memory_fraction=0.3`. What am I missing? – sakisk Mar 15 '18 at 13:48
  • `MaxBytesInUse` is the correct value of memory that's used. The remaining memory is kept allocated by TF but not used – Yaroslav Bulatov Mar 16 '18 at 21:19
  • @YaroslavBulatov Could you elaborate on your last comment? If you know where in TF source code this is happening, could you please point me to it? Also, I realize unless you use allow_growth, TF will allocate all memory up front. However, I have seen that BytesInUse() and MaxBytesInUse do not show an accurate amount of memory being used. I see an extra 700 mib in nvidia-smi. Hopefully this makes sense. – dbep Sep 06 '18 at 20:22
  • What is 'current usage' ? – mrgloom Nov 09 '19 at 20:03
  • 7
    A TF 2.0 version would be nice? Contrib doesn't exist there. – user3496060 Dec 08 '19 at 18:59
0

Yaroslav Bulatov's answer is the best solution for TF1.

For TF2, however, contrib package does not exist. The best way is to use tf's profiler -- https://www.tensorflow.org/guide/profiler#memory_profile_tool

It will plot a memory utilization graph like this. enter image description here

Jasper
  • 404
  • 4
  • 8
  • I think this requires TensorBoard and the data format looks very undocumented and therefore unusable for automatic analyses. Did I overlook something? – mxmlnkn Mar 31 '21 at 12:11