16

I want to ask the same question as Python 3: does Pool keep the original order of data passed to map? for joblib. E.g.:

Parallel(n_jobs=2)(delayed(sqrt)(i ** 2) for i in x)

The syntax kind of implied it but I am always worried about the ordering of output of parallel processing and I don't want to code base on undocumented behavior.

user3226167
  • 3,131
  • 2
  • 30
  • 34
  • It would be nice if we could feed it a nested list of jobs or a dictionary of jobs and it would return the results in the same structure. – endolith Mar 01 '23 at 18:48

2 Answers2

28

TL;DR - it preserves order for both backends.

Extending @Chris Farr's answer, I implemented a simple test. I make a function wait for some random amount of time (you can check these wait times are not identical). I get that the order is preserved every time, with both backends.

from joblib import Parallel, delayed
import numpy as np
import time

def f(wait):
    time.sleep(wait)
    return wait

n = 50
waits = np.random.uniform(low=0, high=1, size=n)
res = Parallel(n_jobs=8, backend='multiprocessing')(delayed(f)(wait) for wait in waits)
np.all(res == waits)
Yair Daon
  • 1,043
  • 2
  • 15
  • 27
12

Per the joblib documentation you can specify the backend asmultiprocessing which is based on multiprocessing.Pool. Then the other answer would apply that the results are in fact ordered.

Parallel(n_jobs=2, backend="multiprocessing")(delayed(sqrt)(i ** 2) for i in x)

By default, however, they use loky and it isn't immediately clear but it could be detected by implementing tests.

Chris Farr
  • 3,580
  • 1
  • 21
  • 24