I'm trying to use Joblib to parallelize some calculations in my code. Here's what I have so far:
data = {'tag': [], 'radials': []}
def append_radials(*args):
for tag in range(len(sacla_tags)):
det_data = get_detdata_for_single_tag(sacla_tags[tag], beamline, run)
radial_profile = np.array(profiler.get_mean_profile(det_data))
data['radials'].append(radial_profile)
data['tag'].append(sacla_tags[tag])
args = [sacla_tags, beamline, run, profiler]
Parallel(n_jobs=5, verbose=50)(delayed(append_radials)(*args) for i in range(5))
In short, the append_radials
function gathers some data, throws it into the radial_profile
line, then appends the output to the data
dictionary. This is iterating over a very large dataset, which takes ~15 minutes to complete. I want to use joblib to speed this up a bit.
The way the code is written, it loops through every tag (len(sacla_tags)=5000
) in each job. Instead, I want it to split up the tags into say, 5 different job (1000 tags per job), then append the output to the global dictionary data
.
Is this possible? What am I doing wrong here?
EDIT:
Here's a simpler example of my code.
def f(i):
dat = []
q = i**2 * np.sqrt(i**(3/2))
dat.append(q)
return dat
x = np.linspace(1,10000000,1000000)
p = Parallel(n_jobs=5, verbose=50)(delayed(f)(x) for j in range(5))
The output of p
here is:
In [37]: p
Out[37]:
[[array([1.00000000e+00, 7.30854392e+02, 4.32617501e+03, ...,
1.77826963e+19, 1.77827452e+19, 1.77827941e+19])],
[array([1.00000000e+00, 7.30854392e+02, 4.32617501e+03, ...,
1.77826963e+19, 1.77827452e+19, 1.77827941e+19])],
[array([1.00000000e+00, 7.30854392e+02, 4.32617501e+03, ...,
1.77826963e+19, 1.77827452e+19, 1.77827941e+19])],
[array([1.00000000e+00, 7.30854392e+02, 4.32617501e+03, ...,
1.77826963e+19, 1.77827452e+19, 1.77827941e+19])],
[array([1.00000000e+00, 7.30854392e+02, 4.32617501e+03, ...,
1.77826963e+19, 1.77827452e+19, 1.77827941e+19])]]
Each array in p
is the same length, joblib didn't split up the range. Instead, it ran f(x)
five times.
I know this is some basic concept I'm not quite grasping with how joblib works, but how would I tell it to, for example, split x
up into 5 chunks and run those in parallel?