TF-Slim: (Unreasonable?) running out of memory

Question

I am trying to run one of the tutorials in TF SLIM, the one where you fine tune the flowers dataset using Inception-V3 (~104Mb). The GPU has about 2Gb of memory. When I have a batch size more than 8, I get an error because the GPU runs out of memory. In fact, it seems I get several messages, each looking like:

W tensorflow/core/common_runtime/bfc_allocator.cc:217] Ran out of memory trying to allocate 646.50MiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory is available.

and

W tensorflow/core/common_runtime/bfc_allocator.cc:274]     **************************************x*************************************************************
W tensorflow/core/common_runtime/bfc_allocator.cc:275] Ran out of memory trying to allocate 168.8KiB.  See logs for memory state.

Now, it could very well be that my GPU has not big enough RAM. However, 2GB seems more than enough to load a ~100Mb model. Also, with Caffe I could fine-tune an Alexnet (~400Mb) with no problem. Besides, I also tried to allow for GPU growth (from what I understand using the system's RAM) with

session_config = tf.ConfigProto(allow_soft_placement=True)
session_config.gpu_options.allow_growth = True
session_config.gpu_options.allocator_type = 'BFC'"

but it doesn't seem to help.

Do you know if

a) I am doing something wrong b) the GPU is just not big enough c) TF Slim consumes by construction too much memory

?

Thanks,

Now that I am thinking, it could be that the layer activations occupy the memory (unlike Alexnet, there are many more layers). Still, it would be nice if someone else would verify that. — Stratis, Mar 02 '17 at 22:21

score 0 · Answer 1 · edited May 23 '17 at 11:53

0

Could some other process be using enough GPU memory that not much is left over for tensorflow? I believe nvidia-smi will tell you how much GPU memory is already in use.

If that's not the case you might want to look at the allocations to see what is going on. See this other question on how to log allocations from tensorflow.

edited May 23 '17 at 11:53

Community

1
1

answered Mar 06 '17 at 21:21

Alexandre Passos

5,186
1
14
19

Hmm, thanks for the idea. I guess I could try to kill the X server, although I find it hard to believe that the Desktop is taking so much of the GPU memory. – Stratis Mar 07 '17 at 13:54

TF-Slim: (Unreasonable?) running out of memory

1 Answers1