I understand this is a slightly vague and open ended question, but I need some help in this area as a quick Google/Stack Overflow search hasn't yielded useful information.
The basic idea is to use multiple processes to speed up an expensive computation that currently gets executed sequentially in a loop. The caveat being that I have 2 significant data structures that are accessed by the expensive function:
- one data structure will be read by all processes but is not ever modified by a process (so could be copied to each process, assuming memory size isn't an issue, which, in this case, it isn't)
- the other data structure will spend most of the time being read by processes, but will occasionally be written to, and this update needs to be propagated to all processes from that point onwards
Currently the program works very basically like so:
def do_all_the_things(self):
read_only_obj = {...}
read_write_obj = {...}
output = []
for i in range(4):
for j in range(4):
output.append(do_expensive_operation(read_only_obj, read_write_obj))
return output
In a uniprocessor world, this is fine as any changes made to read_write_obj
are accessed sequentially.
What I am looking to do is to run each instance of do_expensive_operation
in a separate process so that a multiprocessor can be fully utilised.
The two things I am looking to understand are:
- How does the whole multiprocessing thing work. I have seen Queues and Pools and don't understand which I should be using in this situation?
- I have a feeling sharing memory (
read_only_obj
andread_write_obj
) is going to be complicated. Is this possible? Advisable? And how do I go about it?
Thank you for your time!