2
  • I have a simple CNN (4 conv-pool-lrelu layers and 2 fully connected ones).
  • I am only using TensorFlow on CPU (no gpu).
  • I have ~6GB of available memory.
  • My batches are composed of 56 images of 640x640 pixels ( < 100 MB ).

And TensorFlow is consuming more that the available memory (causing the program to crash, obviously).

My question is : why does TensorFlow requires this much memory to run my network ? I don't understand what is taking this much space (maybe caching the data several time to optimize convolution computation ? Saving all the hidden outputs for backpropagation purpose ?). And is there a way to prevent TensorFlow from consuming this much memory ?

Side notes :

  • I cannot reduce the size of the batch, I am trying to make some Multiple Instance Learning, so I need to compute all my patches in one run.
  • I am using a AdamOptimizer
  • All my convolutions are 5x5 windows, 1x1 stride, with (from 1st one to last one) 32, 64, 128 and 256 features. I am using leaky ReLUs and 2x2 max pooling. FC layers are composed of 64 and 3 neurones.
  • Using Ubuntu 16.4 / Python 3.6.4 / TensorFlow 1.6.0
Motiss
  • 72
  • 1
  • 13

1 Answers1

5

As you have mentioned:

All my convolutions are 5x5 windows, 1x1 stride, with (from 1st one to last one) 32, 64, 128 and 256 features. I am using leaky ReLUs and 2x2 max pooling. FC layers are composed of 64 and 3 neurones.

So, the memory consumption of your network goes like :

Input: 640x640x3 = 1200 (in KB)

C1: 636x636x32 = 12.5 MB (stride=1 worked)

P1: 635x635x32 = 12.3 MB (stride=1 worked)

C2: 631x631x64 = 24.3 MB

P2: 630x630x64 = 24.2 MB

C3: 626x626x128 = 47.83 MB

P3: 625x625x128 = 47.68 MB

C4: 621x621x256 = 94.15 MB

P4: 620x620x256 = 93.84 MB

FC1: 64 = 0.0625 KB (negligible)

FC2: 3 = 0.003 KB (negligible)

Total for one image = ~ 358 MB

For batch of 56 image = 56 x 358 ~19.6 GB

That's why your network does not run on 6 GB. Try with some higher stride or lower sized image to adjust it into 6 GB space. And it should work.

You can refer this to better understand memory consumption calculation.

Akash Goyal
  • 1,273
  • 10
  • 15
  • Hum ... What I don't understand is that, after P1's output is calculated, we don't need to still remember C1's output, we can throw it and free our memory. As well for P1's output when C2's one has been calculated. So the maximum allocated memory at a given time should never be the whole (1200+12800+12640.5+25281+24964+49928+49298+98596+97344+64+3) sum, but just a local part of it. :-/ – Motiss Apr 30 '18 at 07:14
  • And btw, my POOL layers don't overlap, so the size of the data is reduced by 4 after each POOL. – Motiss Apr 30 '18 at 07:16
  • 1
    You are saying you have stride=2 for POOL. So, in that case memory consumtion of network will be ~1.6GB in one forward pass for a batch.(There may be some backward pass memory consumption also). Check all the parameters correctly. – Akash Goyal May 01 '18 at 09:37
  • Hum ... If I look to your calculations, you say FC2=3B, implying an output worth 1B, but everything are actually some tensorflow.float32, implying 4B. Correcting this, we may have 1.6GB * 4 > 6GB. But it still doesn't explain me why would TensorFlow memories all the hidden ouputs when not needed anymore for computation. – Motiss May 02 '18 at 03:25
  • 1
    If you are doing only forward propagation, you donot need the hidden outputs. But all hidden outputs are needed for weight computations during backward propagation. Check any example of how weights are calculated during backprop. – Akash Goyal May 02 '18 at 04:00
  • This is the moment I'm supposed to blame myself for being so stupid. :D Thanks a lot ! – Motiss May 02 '18 at 07:06