13

What is the difference between the following LocalCluster configurations for dask.distributed?

Client(n_workers=4, processes=False, threads_per_worker=1)

versus

Client(n_workers=1, processes=True, threads_per_worker=4)

They both have four threads working on the task graph, but the first has four workers. What, then, would be the benefit of having multiple workers acting as threads as opposed to a single worker with multiple threads?

Edit: just a clarification, I'm aware of the difference between processes, threads and shared memory, so this question is oriented more towards the configurational differences of these two Clients.

jrinker
  • 2,010
  • 2
  • 14
  • 17

3 Answers3

8

I was inspired by both Victor and Martin's answers to dig a little deeper, so here's an in-depth summary of my understanding. (couldn't do it in a comment)

First, note that the scheduler printout in this version of dask isn't quite intuitive. processes is actually the number of workers, cores is actually the total number of threads in all workers.

Secondly, Victor's comments about the TCP address and adding/connecting more workers are good to point out. I'm not sure if more workers could be added to a cluster with processes=False, but I think the answer is probably yes.

Now, consider the following script:

from dask.distributed import Client

if __name__ == '__main__':
    with Client(processes=False) as client:  # Config 1
        print(client)
    with Client(processes=False, n_workers=4) as client:  # Config 2
        print(client)
    with Client(processes=False, n_workers=3) as client:  # Config 3
        print(client)
    with Client(processes=True) as client:  # Config 4
        print(client)
    with Client(processes=True, n_workers=3) as client:  # Config 5
        print(client)
    with Client(processes=True, n_workers=3,
                threads_per_worker=1) as client:  # Config 6
        print(client)

This produces the following output in dask version 2.3.0 for my laptop (4 cores):

<Client: scheduler='inproc://90.147.106.86/14980/1' processes=1 cores=4>
<Client: scheduler='inproc://90.147.106.86/14980/9' processes=4 cores=4>
<Client: scheduler='inproc://90.147.106.86/14980/26' processes=3 cores=6>
<Client: scheduler='tcp://127.0.0.1:51744' processes=4 cores=4>
<Client: scheduler='tcp://127.0.0.1:51788' processes=3 cores=6>
<Client: scheduler='tcp://127.0.0.1:51818' processes=3 cores=3>

Here's my understanding of the differences between the configurations:

  1. The scheduler and all workers are run as threads within the Client process. (As Martin said, this is useful for introspection.) Because neither the number of workers or the number of threads/worker is given, dask calls its function nprocesses_nthreads() to set the defaults (with processes=False, 1 process and threads equal to available cores).
  2. Same as 1, but since n_workers was given, the threads/workers is chosen by dask such that the total number of threads is equal to the number of cores (i.e., 1). Again, processes in the print output is not exactly correct -- it's actually the number of workers (which in this case are actually threads).
  3. Same as 2, but since n_workers doesn't divide equally into the number of cores, dask chooses 2 threads/worker to overcommit instead of undercommit.
  4. The Client, Scheduler and all workers are separate processes. Dask chooses the default number of workers (equal to cores because it's <= 4) and the default number of threads/worker (1).
  5. Same processes/thread configuration as 5, but the total threads are overprescribed for the same reason as 3.
  6. This behaves as expected.
jrinker
  • 2,010
  • 2
  • 14
  • 17
  • considering your explanations in bullets 2 and 3, if processes = 3 actually means "3 threads" (bullet 2), what does it mean to have 2 threads per thread? (3 processes, 6 cores for client 3) @jrinker – honor Jan 13 '21 at 15:31
  • 1
    I am working in a machine with 4 available cores and I am able to define a client with 5 `n_workers` and more , why is that? How does that fit the idea that `processes` is actually the number of workers and why is that not limited to the number of physical cores in my machine? – Luis Chaves Rodriguez Jan 26 '21 at 11:45
6

You are conflating a couple of different things here:

  • the balance between number of processes and threads, with different mixtures favouring different work loads. More threads per worker mean better sharing of memory resources and avoiding serialisation; fewer threads and more processes means better avoiding of the GIL

  • with processes=False, both the scheduler and workers are run as threads within the same process as the client. Again, they will share memory resources, and you will not even have to serialise object between the client and scheduler. However, you will have many threads, and the client may become less responsive. This is commonly used for testing, as the scheduler and worker objects can be directly introspected.

mdurant
  • 27,272
  • 5
  • 45
  • 74
  • Hi Martin, the second bullet point is the info I was looking for -- thanks! I dug into several different configurations and made my own answer with, well, probably way too much detail. Hopefully everything that I wrote is correct, but if you see that I made a mistake there, please let me know so I can fix it. – jrinker Sep 06 '19 at 16:53
5

When you use processes=False you are constraining your cluster to work only through your machine architecture.

from dask.distributed import Client

# The address provided by processes=False is a In-process transport address. 
# This is used to perform communication between threads 
# Scheduler and workers are on the same machine.
client = Client(processes=False)
client
<Client: scheduler='inproc://10.0.0.168/31904/1' processes=1 cores=4>

# The address provided on processes=True is tcp protocol. 
# This is a network address. You can start workers from others machines
# just pointing the scheduler address to this tcp address 
# (All machines must be on the same network).
client = Client(processes=True)
client
<Client: scheduler='tcp://127.0.0.1:53592' processes=4 cores=4>
Victor Faro
  • 159
  • 1
  • 6
  • Hi Victor, I think Martin's answer is technically correct -- the difference is not local machine versus cluster but workers (and the scheduler) running as threads within the client process. Also, I think a cluster set up with `processes=False` could also be connected to via the given address and scaled up/down, but I could be mistaken. But your demo of printing the client object got me on the right track, so thank you very much for that. – jrinker Sep 06 '19 at 16:57