Background / rationale
I have a Python software which fetches event-based data from a measurement instrument, processes it and writes the results to disk. The input events use a fair amount of memory, somewhere near 10MB/event. When the input event rate is high, the processing may not be fast enough, causing the events to pile up in the internal queue. This goes on until the available memory is almost used up, when it directs the instrument to throttle the acquisition rate (which works, but reduces accuracy). This moment is detected by watching the available system memory using psutil.virtual_memory().available
. For the best results, the throttling should be disabled as soon as enough memory has been made available from processed events. This is where the trouble comes in.
As it seems, the CPython interpreter does not (or not always) return freed memory to the OS, which makes psutil
(and also the gnome-system-monitor
) report insufficient available memory. However, the memory is actually available, as manually disabling the throttling will again fill the queue without the consumption increasing further, unless even more events are placed into the queue than before.
The following example may show this behaviour. On my computer, maybe 50% of the invocations showed the problem, while the rest properly freed the memory. It happened a few times that the memory was freed at the end of iteration 0, but not at the end of iterations 1 and 2, so the behaviour seems to be a bit random.
#!/usr/bin/env python3
import psutil
import time
import queue
import numpy as np
def get_avail() -> int:
avail = psutil.virtual_memory().available
print(f'Available memory: {avail/2**30:.2f} GiB')
return avail
q: 'queue.SimpleQueue[np.ndarray]' = queue.SimpleQueue()
for i in range(3):
print('Iteration', i)
# Allocate data for 90% of available memory.
for i_mat in range(round(0.9 * get_avail() / 2**24)):
q.put(np.ones((2**24,), dtype=np.uint8))
# Show remaining memory.
get_avail()
time.sleep(5)
# The data is now processed, releasing the memory.
try:
n = 0
while True:
n += q.get_nowait().max()
except queue.Empty:
pass
print('Result:', n)
# Show remaining memory.
get_avail()
print(f'Iteration {i} ends')
time.sleep(5)
print('Program done.')
get_avail()
The expected behaviour would be low available memory before the result is printed and high afterwards:
Iteration 0
Available memory: 22.24 GiB
Available memory: 2.17 GiB
Result: 1281
Available memory: 22.22 GiB
Iteration 0 ends
However, it may also end up like this:
Iteration 1
Available memory: 22.22 GiB
Available memory: 2.19 GiB
Result: 1280
Available memory: 2.36 GiB
Iteration 1 ends
Integrating explicit calls to the garbage collector like
print('Result:', n)
# Show remaining memory.
get_avail()
gc.collect(0)
gc.collect(1)
gc.collect(2)
get_avail()
print(f'Iteration {i} ends')
does not help, the memory may still stay used.
I'm aware that there would be workarounds, like e.g. checking the queue size instead of the available memory. But this will make the system more prone to resource exhaustion in case some other process happens to consume lots of memory. Using multiprocessing
would not fix the issue, as the event fetching must be done single-threaded, so the fetching process would always face the same problem in its queue.
Questions
How can I query the interpreter's memory management to find out how much memory is used by referenced objects and how much is just reserved for future use and not given back to the OS?
How can I force the interpreter to give back reserved memory back to the OS, so that the reported available memory actually increases?
The target platform is Ubuntu 20.04+ and CPython 3.8+, support for previous versions or other flavours is not required.
Thanks.