parallel dask for loop slower than regular loop?

Question

If I try to parallelize a for loop with dask, it ends up executing slower than the regular version. Basically, I just follow the introductory example from the dask tutorial, but for some reason it's failing on my end. What am I doing wrong?

In [1]: import numpy as np
   ...: from dask import delayed, compute
   ...: import dask.multiprocessing

In [2]: a10e4 = np.random.rand(10000, 11).astype(np.float16)
   ...: b10e4 = np.random.rand(10000, 11).astype(np.float16)

In [3]: def subtract(a, b):
   ...:     return a - b

In [4]: %%timeit
   ...: results = [subtract(a10e4, b10e4[index]) for index in range(len(b10e4))]
1 loop, best of 3: 10.6 s per loop

In [5]: %%timeit
   ...: values = [delayed(subtract)(a10e4, b10e4[index]) for index in range(len(b10e4)) ]
   ...: resultsDask = compute(*values, get=dask.multiprocessing.get)
1 loop, best of 3: 14.4 s per loop

score 6 · Accepted Answer · answered Feb 12 '18 at 15:37

6

Two issues:

Dask introduces about a millisecond of overhead per task. You'll want to ensure that your computations take significantly longer than that.
When using the multiprocessing scheduler data gets serialized between processes, which can be quite expensive. See http://dask.pydata.org/en/latest/setup.html

answered Feb 12 '18 at 15:37

MRocklin

55,641
23
163
235

I see. Well there is 10 000 subtractions going on, each takes ~ 1 ms. I was thinking that dask splits up the for loop and sends ~1250 subtractions to separate cpus (8 cores), but this might be me misunderstanding it. Is it splitting up individual subtraction operations for the cpus to process? That would explain, why there is no speedup – mistakeNot Feb 12 '18 at 15:53
It does send 1250 subtractions to each cpu (roughly) but there is overhead to each operation. Every delayed call costs you about 1ms. You're encouraged to group things together. For a simple operation like this you'd be encouraged to use something like dask.array, dask.dataframe or dask.bag to group things for you. – MRocklin Feb 12 '18 at 21:20

parallel dask for loop slower than regular loop?

1 Answers1

Linked