1

I wrote a script that I deploy in an HPC node with 112 cores, thus starting 112 processes up to completing 400 needed (node_combinations is a list of 400 tuples). The relevant snippet of code is below:

# Parallel Path Probability Calculation
# =====================================
node_combinations = [(i, j) for i in g.nodes for j in g.nodes]
pool = Pool()
start = datetime.datetime.now()
logging.info("Start time: %s", start)
print("Start time: ", start)
pool.starmap(g._print_probability_path_ij, node_combinations)
end = datetime.datetime.now()
print("End time: ", end)
print("Run time: ", end - start)
logging.info("End time: %s", end)
logging.info("Total run time: %s", start)
pool.close()
pool.join()

I follow the performance by running htop and observed the following. Initially all 112 cores are working at 100%. Eventually, since some processes are shorter than others, I am left with a smaller number of cores working at 100%. Eventually, all processes are shown as sleeping.

I believe the problem is that some of these processes (the ones that take longer, about 20 out of 400) require a lot of memory. When memory runs short, the processes go to sleep and since memory is never freed, they remain there, sleeping. These are my questions:

  1. Once a process finishes, are the resources (read memory) freed or do they remain occupied until all processes finish? In other words, once I have only 20 cores working (because the others already processed all the shorter processes) do they have access do all the memory or only the not used by the rest of the processes?

  2. I've read that maxtasksperchild may help in this situation. How would that work? How can I determine what is the appropriate number of tasks for each child?

If you wonder why I am asking this, it's because in the documentation I read this: New in version 2.7: maxtasksperchild is the number of tasks a worker process can complete before it will exit and be replaced with a fresh worker process, to enable unused resources to be freed. The default maxtasksperchild is None, which means worker processes will live as long as the pool.

YamiOmar88
  • 1,336
  • 1
  • 8
  • 20
  • The processes don't sleep. They simply raise an Out Of Memory exception. Because they've raise some sort of exception, they're probably not getting to the end of the code which would ready them to be joined – delyeet Jan 29 '20 at 16:16
  • If they are raising an exception, why isn't my code failing? Once only the 20 longest jobs remain, they eventually go to sleep. It doesn't fail with Out of Memory. – YamiOmar88 Feb 04 '20 at 07:54
  • A failed unjoined process will simply block until it can join. It's not sleeping, it's stuck in a unhandled error which is resulting in deadlock – delyeet Feb 06 '20 at 14:40
  • Is there any way this can be handled? I mean, can I re-start these processes somehow once resources are freed? I've been monitoring progress and when resources are available, the processes don't re-start. Nothing happens... – YamiOmar88 Feb 07 '20 at 07:12
  • There's a couple ways you could go by that... It depends on what your end goal is. Realistically you should try to avoid running out of memory to begin with (perhaps some strategically placed del statements), or determining if a particular process will be able to allocate enough ram to it (and if not, placing it back into a queue). Another thing would be to actually handle out of memory errors within your process and then raise them to your dispatcher for re-queuing or splitting the workload if possible – delyeet Feb 11 '20 at 16:30
  • Assuming you are using Linux (you did not specify), when processes run out of memory they most likely get killed by the kernel OOM killer. This would be quite visible in your application. When using `htop`, do you see the machine actually hitting the memory cap? Do you see it swapping? – noxdafox Feb 12 '20 at 10:01
  • Yes, I am using Linux. However, my program doesn't fail. I don't get an error or exception. Nothing. At some point, no processor is working and nothing happens until it times out. So the out of memory is not visible in that sense. With `htop` I do see the memory being almost maxed out. The bar from Swp is also almost maxed out. Is that what you mean by seeing it swapping? – YamiOmar88 Feb 12 '20 at 12:11

1 Answers1

0

You should leave at least one core available to the core OS and one available to the initiating script; try reducing your pool size. e.g. Pool(110)

Use Pool.imap (or imap_unordered) instead of Pool.map. This will iterate over data lazily than loading all of it in memory before starting processing.

Set a value to maxtasksperchild parameter.

When you use multiprocessing Pool, child processes will be created using the fork() system call. Each of those processes starts with a copy of the memory of the parent process at that time. Because you're loading the list of tuples before you create the Pool the processes in the pool will have a copy of the data.

The answer here walks through a method of memory profiling so you can see where your memory is going, when.

JerodG
  • 1,248
  • 1
  • 16
  • 34