1

I'm really confused about what a worker is.

In general I would say a node in a dask cluster which can compute tasks according with directives of the scheduler. However, I thought that a single node could be a cpu core and the number of threads per worker at most the number of thread per cpu core. Working on a single machine I can set a number of workers grater than the CPU cores present in my laptop and a number of threads per worker larger than a number of thread per cpu core.

So what is actually a worker when I set a local cluster?

It refers to something physical on my machine?

Why no error comes out?

enter image description here

Community
  • 1
  • 1
Stefano Barone
  • 179
  • 1
  • 6

1 Answers1

4

You can have as many threads running on your system as you like - because you have a modern multitasking operating system. The OS takes care of waking threads and running them in the cores of your CPU, and in your case, at most four threads can be running simultaneously. Therefore, it is probably not in your interests to have more than four dask worker threads in total.

You can choose how many workers (read: processes) and threads are appropriate for your application, where processes are not mutually blocked by the GIL, but threads can efficiently share memory.

mdurant
  • 27,272
  • 5
  • 45
  • 74
  • hi @mdurant, I stumbled upon this article, where the person was using 50 workers per CPU! Can you clarify? – ruakn Apr 29 '22 at 08:16
  • What would you like to know? The maximum number of active threads is determined by the CPU, but you can have any number of waiting threads. – mdurant Apr 29 '22 at 13:19