1

When setting up a dask cluster using Helm, there are a set of variables in the config.yaml file for customizing the number of workers, and I'm hoping for some help with the terminology. For example, if I set up an Kubernetes cluster with 16 virtual machines, 8 cores/machine and 32GB/virtual machine, I end up with 128 vCPUs and 512GB memory. If I pass the "helm ... update -f config.yaml"

worker:
  name: worker
  allowed-failures: 2
  replicas: 48
  resources:
    limits: 
      cpu: 2
      memory: 8G
    requests:
      cpu: 2
      memory: 8G

It seems like I should be able to create 64 workers with 2 cpus each, and use all of my 512 GB RAM. (Minus the resources dedicated to the scheduler). However, in practice, the distributed client tops out at 40 workers, 80 cores and 320 GB of total RAM.

Are there best practices around setting up pods to maximize the utilization of the cluster? I know from this post that the workload comes first, in terms of the use of threads and processes per worker, but should the number of workers == the number of cores == number of pods? If so, what is the role of the cpu keyword in the above .yaml file?

GHayes
  • 55
  • 5

1 Answers1

0

My first guess is that other things are running on your nodes, and so Kubernetes doesn't feel comfortable giving everything that you've asked for. For example, Kubernetes itself takes up some memory.

MRocklin
  • 55,641
  • 23
  • 163
  • 235