One key difference is when and how your worker function returns its result. If you use your worker for its side effects (creating files etc.) and don't expect it to return anything then this does not apply to you.
from multiprocessing import Pool
import time
start = time.time()
def get_time():
return int(time.time() - start)
def worker(args):
name, delay = args
print(f'{get_time()}: Job {name} started ({delay} seconds)')
time.sleep(delay)
return f'Job {name} done'
jobs = [
('A', 1),
('B', 2),
('C', 10),
('D', 3),
('E', 4),
('F', 5),
]
if __name__ == '__main__':
with Pool(2) as pool:
for result in pool.map(worker, jobs):
print(f'{get_time()}: {result}')
If you use map
, the code generates this output:
0: Job A started (1 seconds)
0: Job B started (2 seconds)
1: Job C started (10 seconds)
2: Job D started (3 seconds)
5: Job E started (4 seconds)
9: Job F started (5 seconds)
14: Job A done
14: Job B done
14: Job C done
14: Job D done
14: Job E done
14: Job F done
As you can see, all jobs are returned in a bulk and in the input order in the 14th second, regardless of when they actually finished.
If you change the method to imap
, the code then generates this output:
0: Job A started (1 seconds)
0: Job B started (2 seconds)
1: Job C started (10 seconds)
1: Job A done
2: Job D started (3 seconds)
2: Job B done
5: Job E started (4 seconds)
9: Job F started (5 seconds)
11: Job C done
11: Job D done
11: Job E done
14: Job F done
Now the full code finishes again in the 14th second but some jobs (A
, B
) are returned earlier, when they actually finished. This method still keeps the input order so even though (you can calculate that) jobs D
and E
finished in the 5th and 9th second, they could not be returned earlier - they still had to wait for the long job C
until the 11th second.
If you change the method to imap_unordered
, the code then generates this output:
0: Job A started (1 seconds)
0: Job B started (2 seconds)
1: Job C started (10 seconds)
1: Job A done
2: Job D started (3 seconds)
2: Job B done
5: Job E started (4 seconds)
5: Job D done
9: Job F started (5 seconds)
9: Job E done
11: Job C done
14: Job F done
Now all jobs are returned immediately when they finish. The input order is not preserved.