1

I define a cpu-bound function

def countdown(n):
    while n > 0:
        n -= 1

running countdown(50000000) takes 2.16 seconds on my laptop.

First, I test multiprocess parallelization.

from multiprocess import Pool
with Pool(2) as p:
    l=p.map(countdown,[50000000,50000000])

takes 2.46 seconds, which is a good parallelization.

Then, I test dask processes scheduler parallelization

l=[dask.delayed(countdown)(50000000),dask.delayed(countdown)(50000000)]
dask.compute(l,scheduler='processes',num_workers=2)

however, it takes 4.53 seconds. This is the same speed as

dask.compute(l,scheduler='threads',num_workers=2)

What is wrong with dask processes scheduler? I expected it should be on a par with multiprocess

user15964
  • 2,507
  • 2
  • 31
  • 57
  • If this is a bug or an unexpected behavior, it could be a good idea to [post an issue](https://github.com/dask/dask/issues/) so Dask developers can improve this. – Jérôme Richard Aug 10 '21 at 21:41

1 Answers1

1

The following works, so I'm not sure if the above is a bug or a feature:

from dask.distributed import Client

with Client(n_workers=2) as client:
    l=[dask.delayed(countdown)(50000000),dask.delayed(countdown)(50000000)]
    dask.compute(*l)
SultanOrazbayev
  • 14,900
  • 3
  • 16
  • 46
  • Thank you so much. It is indeed works. But I intentionally avoid using distributed. Because I found distributed Client consumes unreasonably huge memory sometimes. There are some post talking about this like https://stackoverflow.com/q/56779254/1911722 – user15964 Aug 03 '21 at 12:21
  • Thanks, I was also surprised when I could reproduce your results... to me it seems like a bug, but I don't have a good enough understanding to be sure... – SultanOrazbayev Aug 03 '21 at 12:27
  • > Client consumes unreasonably huge memory <. This situation should be solved too. The distributed scheduler is generally recommended over multiprocessing, even on a single machine. – mdurant Aug 09 '21 at 12:22