Questions tagged [dask-jobqueue]
19 questions
5
votes
0 answers
Dask distributed KeyError
I am trying to learn Dask using a small example. Basically I read in a file and calculate row means.
from dask_jobqueue import SLURMCluster
cluster = SLURMCluster(cores=4, memory='24 GB')
cluster.scale(4)
from dask.distributed import Client
client…

Phoenix Mu
- 648
- 7
- 12
4
votes
1 answer
Is there a way of using dask jobqueue over ssh
Dask jobqueue seems to be a very nice solution for distributing jobs to PBS/Slurm managed clusters. However, if I'm understanding its use correctly, you must create instance of "PBSCluster/SLURMCluster" on head/login node. Then you can on the same…

Phil Reinhold
- 141
- 1
2
votes
1 answer
Difference between dask node and compute node for slurm configuration
First off, apologies if I use confusing or incorrect terminology, I am still learning.
I am trying to set up configuration for a Slurm-enabled adaptive cluster.
Documentation of the supercomputer and it’s Slurm configuration is documented here. Here…

pgierz
- 674
- 3
- 7
- 14
1
vote
1 answer
Does Dask LocalCluster Shutdown when kernel restarts
If I restart my jupyter kernel will any existing LocalCluster shutdown or will the dask worker processes keep running?
I know when I used a SLURM Cluster the processes keep running if I restart my kernel without calling cluster.close() and I have to…

HashBr0wn
- 387
- 1
- 11
1
vote
0 answers
Logging in Dask
I am using a SLURM cluster and want to be able to be able to add custom logs inside my task that should appear in the logs on the dashboard when inspecting a particular worker.
Alternatively I would like to be able to extract the name of the worker…

HashBr0wn
- 387
- 1
- 11
1
vote
1 answer
Reconfigure Dask jobqueue on the fly
I have a jobqueue configuration for Slurm which looks something like:
cluster = SLURMCluster(cores=20,
processes=2,
memory='62GB',
walltime='12:00:00',
…

Albatross
- 955
- 1
- 7
- 13
1
vote
1 answer
Dask Jobqueue - Why does using processes result in cancelled jobs?
Main issue
I'm using Dask Jobequeue on a Slurm supercomputer. My workload includes a mix of threaded (i.e. numpy) and python workloads, so I think a balance of threads and processes would be best for my deployment (which is the default behaviour).…

Albatross
- 955
- 1
- 7
- 13
1
vote
1 answer
Dask jobqueue job killed due to permission
I'm trying to use Dask job-queue on our HPC system. And this is the code I'm using:
from dask_jobqueue import SLURMCluster
cluster = SLURMCluster(cores=2, memory='20GB', processes=1,
log_directory='logs',
…

Phoenix Mu
- 648
- 7
- 12
0
votes
0 answers
Worker log file gets mangled when log_directory is set
I used to have the worker logs as such:
./slurm-.out
...
So I wanted to have SLURMCluster writes the worker logs in a separate directory (as opposed to the current working dir), so I provided "log_directory" as an input argument as such.
from…

michaelgbj
- 290
- 1
- 10
0
votes
0 answers
How DASK_jobqueue SLURMCluster can access local python module in parent directory
I'm using dask_jobqueue to establish a SLURMCluster. I'm trying to pass python files in the parent directory to the workers. I tried different ways including sys.path.append, setting PYTHONPATH in my .bashrc file, and setting PYTHONPATH in env_extra…

shambakey1
- 37
- 7
0
votes
1 answer
Job, Worker, and Task in dask_jobqueue
I am using a SLURM cluster with Dask and don't quite understand the configuration part. The documentation talks of jobs and workers and even has a section on the difference:
In dask-distributed, a Worker is a Python object and node in a dask…

HashBr0wn
- 387
- 1
- 11
0
votes
1 answer
How to change dask job_name to SGECluster
I am using dask_jobqueue.SGECluster() and when I submit jobs to the grid they are all listed as dask-worker. I want to have different names for each submitted job.
Here is one example:
futures = []
for i in range(1,10):
res =…

IvanV
- 1
- 1
0
votes
2 answers
Dask workers get stuck in SLURM queue and won't start until the master hits the walltime
Lately, I've been trying to do some machine learning work with Dask on an HPC cluster which uses the SLURM scheduler. Importantly, on this cluster SLURM is configured to have a hard wall-time limit of 24h per job.
Initially, I ran my code with a…

Marta Moreno
- 1
- 2
0
votes
1 answer
How to speed up launching workers when the number of workers is large?
Currently, I use dask_jobsqueue to parallelize my code, and I have difficulty setting up a cluster quickly when the number of workers is large.
When I scale up the number of workers (say more than 2000), it takes more than 15 mins for the cluster to…

Yuki
- 1
0
votes
1 answer
Dask: Would storage network speed cause a worker to die
I am running a process that writes large files across the storage network. I can run the process using a simple loop and I get no failures. I can run using distributed and jobqueue during off peak hours and no workers fail. However when I run the…

schierkolk
- 29
- 4