-1

I ran my first multiprocessing code. The test code is shown below. In my test I just ran two processes to see if it produced the results as expected, which it did.

I now want to run it for 'real'. My computer has 8 cores & I want to run approx. 100 processes. My question is if I run the code below and it creates 100 processes, do I need to specify the max number of processes to run at one time or does the code in the background do some clever stuff and realise that there are only 8 cores and optimise accordingly?

 if __name__ == '__main__':

    # set up the data
    df_data = Somefunc()   
    pickled_df = pickle.dumps(df_data)
    size = len(pickled_df)

    # create a shared memory
    shm = shared_memory.SharedMemory(create=True, size=size)
    shm.buf[:size] = pickled_df

    # Notice that we only pass the name of the block, not the block itself
    processes = [Process(target=run_func, args=(shm.name, x)) for x in range(1, 3)]
    [p.start() for p in processes]
    [p.join() for p in processes]

    shm.close()

    # Unlink should only be called once on a memory block
    shm.unlink()
mHelpMe
  • 6,336
  • 24
  • 75
  • 150
  • Tangentially, the code you put inside `if __name__ == '__main__':` should be absolutely trivial. The condition is only useful when you `import` this code; if all the useful functionality is excluded when you `import`, you will never want to do that anyway. See also https://stackoverflow.com/a/69778466/874188 – tripleee Jul 23 '22 at 17:40
  • 1
    `multiprocessing.Process` creates a real, OS-managed, process. you should probably go do some outside research on what that means. You are free to create as many processes as you want, and it's up to the OS to schedule them to run on your cpu. Most computers will have tens to hundreds of processes at any given time, though many will be idle. If you have more active processes than cpu cores, the OS will rapidly swap back and forth between them to make sure each process gets *some* compute time, but only 8 (or however many cores you have) can ever execute at exactly the same time. – Aaron Jul 23 '22 at 17:45
  • @tripleee, that does not seem like it applies to multiprocessing. An ideal main module is also expected to hide code from child processes to limit data that would have been copied between the parent and child (on windows anyway). – Charchit Agarwal Jul 23 '22 at 20:04

1 Answers1

0

If you use a multiprocessing.Pool to manage processes, it will default the number of worker processes to the value of os.cpu_count(), which in most cases gives you an appropriate value.

Similarly, if you use concurrent.futures with a ProcessPoolExecutor, then the default number of workers will be "the number of processors on the machine".

If you are just creating Process objects and starting them yourself, there is no such default limit.


See this answer for a discussion of when "the number of cores on which your process can run" is different from "the number of processors on the machine".

larsks
  • 277,717
  • 41
  • 399
  • 399