2

I have hypothetical graph which has a series of computations as follows:

a_0 = tf.placeholder()
a_1 = some_op_1(a_0)
a_2 = some_op_2(a_1)
a_3 = some_op_3(a_2)

Observe that when computing a_3; a_0 and a_1 are not needed and hence they can be discarded prior to allocating memory for a_3. Is there some way to ask TensorFlow to perform this memory optimization (agree that there is some cost in time)?

Please note that this is not the same as this question about allocating memory only when needed.

EDIT: This network will not be trained, so don't worry about backprop.

Community
  • 1
  • 1
Priyatham
  • 2,821
  • 1
  • 19
  • 33
  • 2
    It's done automatically -- by the time `a_3` computation starts, TensorFlow will have already discarded `a_0` and `a_1`. You can use https://github.com/yaroslavvb/memory_util to see timeline of memory allocations/deallocations to verify this is indeed the case – Yaroslav Bulatov Mar 06 '17 at 17:27
  • 1
    @YaroslavBulatov Your `memory_util` tool doesn't show anything with the gpu, only the cpu. If memory is being deallocated, your tool doesn't show it. I haven't seen your tool work. – adam.hendry Jul 14 '19 at 19:30
  • @A.Hendry which system? There may be an issue with how TF reports memory on Windows – Yaroslav Bulatov Jul 15 '19 at 21:46

1 Answers1

3

TensorFlow uses reference counting to release the memory used by a tensor as soon as it is no longer used. The values of a_0 and a_1 will be deleted as soon as there are no more references to them, and in the latest builds of TensorFlow (post-1.0 nightly builds) some operations will even reuse the input buffer for the output if they have the same shape and element type.

mrry
  • 125,488
  • 26
  • 399
  • 400
  • 1
    I'm guessing that's due to [203a4d98](https://github.com/tensorflow/tensorflow/commit/203a4d98) ? ReLU for forward-only networks is the most relevant one, it would be good if reuse happened there – Yaroslav Bulatov Mar 06 '17 at 18:58
  • 1
    Yes, it applies to all unary and binary elementwise ops (including ReLU), and a few others. – mrry Mar 06 '17 at 19:04
  • 1
    I am sorry. I asked this questions because I am getting OOM errors when I am feeding double the expected image size for a convolution only network. I assumed it was due to the above reason. I'll double check my memory calculations withe @YaroslavBulatov 's script. – Priyatham Mar 06 '17 at 19:07
  • 2
    @priyatham - If you don't have control flow, you can also try running `linearize.linearize()` from https://github.com/yaroslavvb/stuff/tree/master/linearize before first `session.run` -- that script forces deterministic memory efficient execution order, as opposed to TensorFlow's random order (which uses magnitudes more memory in some special cases) – Yaroslav Bulatov Mar 06 '17 at 19:14
  • 2
    Is this really true ? Last time I checked to release a tensor you need to destroy the entire session, which is not very convenient in some cases – Overdrivr Jan 15 '18 at 15:34