3

I'm trying to see if Dask would be a suitable addition to my project and wrote some very simple test cases to look into it's performance. However, Dask is taking a relatively long time to simply perform the lazy initialization.

@delayed
def normd(st):
    return st.lower().replace(',', '')

@delayed
def add_vald(v):
    return v+5

def norm(st):
    return st.lower().replace(',', '')

def add_val(v):
    return v+5

test_list = [i for i in range(1000)]
test_list1 = ["AeBe,oF,221e"]*1000

%timeit rlist = [add_val(y) for y in test_list]
#124 µs ± 7.25 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

%timeit rlist = [norm(y) for y in test_list1]
#392 µs ± 18.9 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

%timeit rlist = [add_vald(y) for y in test_list]
#19.1 ms ± 436 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

rlist = [add_vald(y) for y in test_list]
%timeit rlist1 = compute(*rlist, get=dask.multiprocessing.get)
#892 ms ± 36.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

%timeit rlist = [normd(y) for y in test_list1]
#18.7 ms ± 408 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

rlist = [normd(y) for y in test_list1]
%timeit rlist1 = compute(*rlist, get=dask.multiprocessing.get)
#912 ms ± 54.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

I have looked into Dask For Loop In Parallel and parallel dask for loop slower than regular loop? and I tried increasing size to 1 million items but while the regular loop takes about a second the dask loop never ends. After waiting for half an hour to simply finish lazy initialization of add_vald I killed it.

I'm not sure what's going wrong here and would greatly appreciate any insight you might be able to offer. Thanks!

ltt
  • 417
  • 3
  • 12

1 Answers1

0

When creating a delayed object, dask is doing a couple of things:

  • calculating a unique key for the object, based on the function and inputs
  • creating a graph object to store the desired operations.

You could probably do these things a little faster with your own dict comprehension - delayed is intended for convenience.

Upon execution, each task requires some overhead, whether from switching threads, or communicating with between processes, depending on the scheduler chosen. This is well documented. Furthermore, threads within a process won't actually run in parallel for this workload because of python's GIL.

In general, it is recommended to split your work into batches, such that the overhead per task becomes small compared to the time it takes to execute the task; so that using dask becomes worthwhile. Don't forget the first rule of dask: be sure that you need dask.

mdurant
  • 27,272
  • 5
  • 45
  • 74