Questions tagged [dask-jobqueue]

19 questions
5
votes
0 answers

Dask distributed KeyError

I am trying to learn Dask using a small example. Basically I read in a file and calculate row means. from dask_jobqueue import SLURMCluster cluster = SLURMCluster(cores=4, memory='24 GB') cluster.scale(4) from dask.distributed import Client client…
4
votes
1 answer

Is there a way of using dask jobqueue over ssh

Dask jobqueue seems to be a very nice solution for distributing jobs to PBS/Slurm managed clusters. However, if I'm understanding its use correctly, you must create instance of "PBSCluster/SLURMCluster" on head/login node. Then you can on the same…
2
votes
1 answer

Difference between dask node and compute node for slurm configuration

First off, apologies if I use confusing or incorrect terminology, I am still learning. I am trying to set up configuration for a Slurm-enabled adaptive cluster. Documentation of the supercomputer and it’s Slurm configuration is documented here. Here…
pgierz
  • 674
  • 3
  • 7
  • 14
1
vote
1 answer

Does Dask LocalCluster Shutdown when kernel restarts

If I restart my jupyter kernel will any existing LocalCluster shutdown or will the dask worker processes keep running? I know when I used a SLURM Cluster the processes keep running if I restart my kernel without calling cluster.close() and I have to…
HashBr0wn
  • 387
  • 1
  • 11
1
vote
0 answers

Logging in Dask

I am using a SLURM cluster and want to be able to be able to add custom logs inside my task that should appear in the logs on the dashboard when inspecting a particular worker. Alternatively I would like to be able to extract the name of the worker…
HashBr0wn
  • 387
  • 1
  • 11
1
vote
1 answer

Reconfigure Dask jobqueue on the fly

I have a jobqueue configuration for Slurm which looks something like: cluster = SLURMCluster(cores=20, processes=2, memory='62GB', walltime='12:00:00', …
Albatross
  • 955
  • 1
  • 7
  • 13
1
vote
1 answer

Dask Jobqueue - Why does using processes result in cancelled jobs?

Main issue I'm using Dask Jobequeue on a Slurm supercomputer. My workload includes a mix of threaded (i.e. numpy) and python workloads, so I think a balance of threads and processes would be best for my deployment (which is the default behaviour).…
Albatross
  • 955
  • 1
  • 7
  • 13
1
vote
1 answer

Dask jobqueue job killed due to permission

I'm trying to use Dask job-queue on our HPC system. And this is the code I'm using: from dask_jobqueue import SLURMCluster cluster = SLURMCluster(cores=2, memory='20GB', processes=1, log_directory='logs', …
Phoenix Mu
  • 648
  • 7
  • 12
0
votes
0 answers

Worker log file gets mangled when log_directory is set

I used to have the worker logs as such: ./slurm-.out ... So I wanted to have SLURMCluster writes the worker logs in a separate directory (as opposed to the current working dir), so I provided "log_directory" as an input argument as such. from…
michaelgbj
  • 290
  • 1
  • 10
0
votes
0 answers

How DASK_jobqueue SLURMCluster can access local python module in parent directory

I'm using dask_jobqueue to establish a SLURMCluster. I'm trying to pass python files in the parent directory to the workers. I tried different ways including sys.path.append, setting PYTHONPATH in my .bashrc file, and setting PYTHONPATH in env_extra…
shambakey1
  • 37
  • 7
0
votes
1 answer

Job, Worker, and Task in dask_jobqueue

I am using a SLURM cluster with Dask and don't quite understand the configuration part. The documentation talks of jobs and workers and even has a section on the difference: In dask-distributed, a Worker is a Python object and node in a dask…
HashBr0wn
  • 387
  • 1
  • 11
0
votes
1 answer

How to change dask job_name to SGECluster

I am using dask_jobqueue.SGECluster() and when I submit jobs to the grid they are all listed as dask-worker. I want to have different names for each submitted job. Here is one example: futures = [] for i in range(1,10): res =…
0
votes
2 answers

Dask workers get stuck in SLURM queue and won't start until the master hits the walltime

Lately, I've been trying to do some machine learning work with Dask on an HPC cluster which uses the SLURM scheduler. Importantly, on this cluster SLURM is configured to have a hard wall-time limit of 24h per job. Initially, I ran my code with a…
0
votes
1 answer

How to speed up launching workers when the number of workers is large?

Currently, I use dask_jobsqueue to parallelize my code, and I have difficulty setting up a cluster quickly when the number of workers is large. When I scale up the number of workers (say more than 2000), it takes more than 15 mins for the cluster to…
Yuki
  • 1
0
votes
1 answer

Dask: Would storage network speed cause a worker to die

I am running a process that writes large files across the storage network. I can run the process using a simple loop and I get no failures. I can run using distributed and jobqueue during off peak hours and no workers fail. However when I run the…
schierkolk
  • 29
  • 4
1
2