1

I am using ubuntu 16.04, tensorflow 1.3

A network with ~ 17M weights

Experiments

  1. image size 400x1000, batch size 4, during graph construction:

    failed to alloc 34359738368 bytes on host: CUDA_ERROR_OUT_OF_MEMORY

  2. image size 300x750, batch size 4, during graph construction:

    failed to alloc 34359738368 bytes on host: CUDA_ERROR_OUT_OF_MEMORY

  3. image size 300x740, batch size 1, during graph construction:

    failed to alloc 34359738368 bytes on host: CUDA_ERROR_OUT_OF_MEMORY

So, the memory requested is the same for all the three experiment. My question is does 17M weights really need such a huge amount of memory? And why the required memory doesn't change with different images sizes and batch sizes ?

Community
  • 1
  • 1
  • Did you try an experiment that worked? How much memory do you have available? (NB: Not familiar with Tensorflow or GPU programming, but it looks like you need to look at your graphics memory.) – jpaugh Aug 22 '17 at 20:42
  • I have two GPU cards, total memory add up to 22GB. – yanchao yang Aug 22 '17 at 20:50
  • I'd try an experiment with the smallest possible sizes (of all variables, e.g. with a 1x1 image) you can. Also, check the program for options whose implicit defaults are set too high. – jpaugh Aug 22 '17 at 20:53
  • BTW, one of your memory cards is probably dedicated to your computer display, and might not be available to other programs. If each of them has 11Gbs, then there's the issue. – jpaugh Aug 22 '17 at 20:55
  • [UPDATE] I decrease the weights to 8M, still I get this: failed to alloc 34359738368 bytes on host: CUDA_ERROR_OUT_OF_MEMORY – yanchao yang Aug 22 '17 at 21:14
  • in conv nets memory is usually used up by activations rather than weights – Yaroslav Bulatov Aug 22 '17 at 21:14
  • also note that TF_CUDA_HOST_MEM_LIMIT_IN_MB controls the limit of memory that will be allocated by TF host allocator – Yaroslav Bulatov Aug 22 '17 at 21:17
  • @YaroslavBulatov: I have just updated, kernels are reduced by half(which means activations should be reduced by half), image size has been decreased to 250x550, I still get this problem, with the same amount of required memory. – yanchao yang Aug 22 '17 at 21:18
  • @yanchaoyang You can use `@username` to ask a specific user to join the conversation. The post owner is always included. – jpaugh Aug 22 '17 at 21:18
  • @yanchaoyang the amount of memory your model needs can be much larger than 34359738368, the error message indicates that failure happened during allocation of a memory chunk of 34359738368 bytes, which is not directly related to amount of memory your model needs – Yaroslav Bulatov Aug 22 '17 at 21:22
  • @yanchaoyang I suspect there's an intermediate tensor roughly the size of 34359738368 bytes that your computation requires, regardless of batch size or image size, you'd need to do some digging into your model to find this – Yaroslav Bulatov Aug 22 '17 at 21:24
  • @YaroslavBulatov I have done a rough calculation: suppose image size 256x576, channels are not more than 512, then the tensor has size 256*576*512*4 byte = 301,989,888 = 0.3 GB, despite any striding, considering 5 such layers, then the maximum size of response tensor add up to 1.5 GB, far less than 11.0GB for one GPU card. – yanchao yang Aug 22 '17 at 21:40
  • @yanchaoyang note that it's not even trying to use your GPU RAM, the error is from host allocator, meaning it failed when trying to allocate 34 GB chunk of main RAM – Yaroslav Bulatov Aug 22 '17 at 21:48
  • @YaroslavBulatov Thanks for the reminder. I guess it would be this problem. As when I decrease the image size to a really smaller number, for example 80x100, it works and I can apply a large batch size. So it seems like that the construction process is trying to allocate memory without considering the actual memory need. COULD YOU POINT OUT a way to ask TF to allocate memory that it actually needs? – yanchao yang Aug 22 '17 at 22:06
  • TF shouldn't allocate 34 GB for no reason, I expect that there's a 34GB tensor created somewhere in your computation – Yaroslav Bulatov Aug 22 '17 at 22:10
  • @YaroslavBulatov But my guess is that there could not be any tensor that will be created with 34GB, as I have illustrated above...DO YOU have a way to check whether this true or not? – yanchao yang Aug 22 '17 at 22:39
  • You can enable verbose logging and look for LOGMEMORY messages to see which op is trying to output 34GB of data, follow links in this question for instruction how -- https://stackoverflow.com/questions/36331419/tensorflow-how-to-measure-how-much-gpu-memory-each-tensor-takes – Yaroslav Bulatov Aug 22 '17 at 22:46

1 Answers1

0

It could because you stored a lot of middles results. After you run sess.run, you alloc some new memory to store the new tensor result, but after adding the new alloc memory, the total memory alloced on your host is more than 32GB. Please check your host memory (not gpu memory) used during the runtime. If that is the case, you need to lower your host memory allocing. Maybe store it to harddisk is a good choice.