2

I have found plenty of questions regarding the tensorflow warning

tensorflow/core/framework/cpu_allocator_impl.cc:81] Allocation of
xxxxxxxxx exceeds 10% of system memory

I know that it is just a warning, not an error and that its display can be suppressed. I also know how to (most likely) address the issue: reduce the batch size.

However, I have never found the issue addressed the other way around:

Given my network and the current state of the system on which it runs, what is the largest batch size it can fit without problems?

So is there a way to access what tensorflow is doing internally from within the python (3.7) interface?

I'm running tf 2.2.0 on my CPU. I know that there are ways to limit the GPU memory usage (let it grow / not use 100% of available space/ etc), but I have not found an equivalent for the CPU. Creating a logical device with memory limit is not supported with CPUs (https://www.tensorflow.org/api_docs/python/tf/config/set_logical_device_configuration)

clAuS
  • 21
  • 5
  • This is not really an answer, but I typically start with a relatively large batch size (large enough to throw and out of memory error) and then decrease batch size until I can train. – jkr Jul 01 '20 at 16:22
  • Usually this approach is good, yes, but I want to write a code that does some machine learning optimization "in the background", without having the user to worry about these things too much. At the same time, I wanted to use the available resources as efficient as possible – clAuS Jul 04 '20 at 20:15
  • 1
    You can find the maximum batch size using `Max batch size= available GPU memory bytes / 4 / (size of tensors + trainableparameters)` for more details on this follow this post https://stackoverflow.com/questions/46654424/ , for CPU as of now you can only do empirical observations like monitoring the CPU usage using PID or different subprocess. –  Nov 12 '20 at 04:50

0 Answers0