1

I am using dask library trying to create multi threaded programs. But I am facing a problem here. Example:

from dask import compute, delayed
import dask.multiprocessing

arr = []

def add():
    arr.append("a")

tasks = [delayed(add)(),delayed(add)()]
compute(*tasks, get = dask.multiprocessing.get)
print(arr)

This code's output is simply [ ].. because I am using multiprocessing. If I am using get = dask.threaded.get The code's output will be = ['a', 'a']

I also need to use multiprocessing to achieve actual parallelism on multiple cores.

So my question is.. is there a way to use dask.multiprocessing and still have the ability to access a shared object?

omargamal8
  • 551
  • 5
  • 12
  • Possible duplicate of [Shared-memory objects in python multiprocessing](https://stackoverflow.com/questions/10721915/shared-memory-objects-in-python-multiprocessing) – pppery Aug 02 '17 at 01:59
  • @ppperry The suggested question asks about the way the problem is handled in python's multi processing library. I am asking about the dask library. Both have totally different implementations. – omargamal8 Aug 02 '17 at 02:54

1 Answers1

2

Under normal operation Dask assumes that functions do not rely on global state. Your functions should consume inputs and return outputs and should not rely on any other information other than what they are given.

Even when using the threaded scheduler you might want to beware of affecting global state because that state might not be threadsafe.

MRocklin
  • 55,641
  • 23
  • 163
  • 235
  • So you are saying that there is no way? – omargamal8 Aug 02 '17 at 02:55
  • You might take a look at the bottom of http://dask.pydata.org/en/latest/futures.html for some more advanced techniques. – MRocklin Aug 02 '17 at 05:04
  • 1
    But generally there is no way to do precisely what you're asking for. As you parallelize across processes or machines you necessarily lose access to shared state. – MRocklin Aug 02 '17 at 05:05