Suppose that you want to specify the number of workers in Dask.array, as Dask documentation shows, you can set:
dask.set_options(pool=ThreadPool(num_workers))
This works pretty well with some simulations I've run, for example, montecarlo's, but with some linear algebra operations, it seems that Dask overrides user specified configuration, for example:
import dask.array as da
import dask
from multiprocessing.pool import ThreadPool
dask.set_options(pool=ThreadPool(num_workers))
mat1 = da.random.random((size, size) chunks=chunk_size)
mat2 = da.random.random((size, size) chunks=chunk_size)
mat3 = mat1.dot(mat2)
mat3.compute()
If I run that program with a small matrix size, it apparently uses only num_workers
workers, but if I increase matrix size, suddenly it creates dozen of workers, as the image shows.
So, how can I request Dask to solve the problem using only num_workers
workers?