0

I have a Python program parallelized with joblib.Parallel, however, as you can see in this top screenshot, each process is using much less than 100% of the CPU, and the process state is "D", i.e., waiting for IO.

top screenshot of parallelized program

The program runs a function once for each of 10000 (very small) datasets. Each such function execution takes a few minutes, and besides doing calculations, it queries a sqlite database via sqlalchemy (reading only), loading in quite a bit of memory.

I suspect that the memory loading and perhaps even leaking may cause the slow-down, but it may also be from other parts of the program.

Is there any way to get the python function stack where the IO is stalling, when running parallelized?

For CPU profiling, I usually use cProfile. However, here I need to understand memory issues and IO blocking. A further issue is that these issues do not occur when I run only one process, so I need some method that can deal with multithreading.

For memory profiling, I see from other questions that there are object counting tools and allocation trackers such as guppy3 and heapy. However, here I think a stacktrace would be more helpful (which part of the code is stalling / memory-heavy) than what object it is.

j13r
  • 2,576
  • 2
  • 21
  • 28

1 Answers1

0

tracemalloc can show stack traces.

Probably something like the following could work:

import tracemalloc

tracemalloc.start()

snapshot1 = tracemalloc.take_snapshot()
# ... call the function leaking memory ...
snapshot2 = tracemalloc.take_snapshot()

top_stats = snapshot2.compare_to(snapshot1, 'traceback')

stat = top_stats[0]
print("%s memory blocks: %.1f KiB" % (stat.count, stat.size / 1024))
for line in stat.traceback.format():
    print(line)
j13r
  • 2,576
  • 2
  • 21
  • 28