I'm running a tensorflow model which is exhausting 60G of RAM in about 10 minutes while processing large images.
I've run Heapy to try to pin down a leak, but heapy shows only 90M of memory usage and remains constant.
I noted this article: Python process consuming increasing amounts of system memory, but heapy shows roughly constant usage
That suggested that the issue might be in python (2.7 here) with memory fragmentation. But that doesn't sound like a reasonable explanation for this case.
- I have 2 python Queues. In one I read an image from disk and load it
to the
raw
queue using a thread. - In another thread I read the
raw
queue, preprocess, and load it into aready
queue. - In my main thread I draw batches of 8 images from the
ready
queue and run them through tensorflow training. - With batches of 8 images (each ~25MB numpy matrices) I should have at least 24 * 25MB worth of memory being held between current processing and the two queues at any given time. But heapy only shows 90M of consumption.
So heapy is failing to see at least the 600M of memory that I know must be held at any given moment.
Hence, if heapy can't see the memory I know is there, I can't trust it to see where the leak is. At the rate it's leaking it's a virtual certainty that the batches of images are causing it.
I'm using the threading
module in python to kick off the loader and preprocessor threads. I've tried calling print h.heap()
from within the threads code and the main code, all with the same results.