Questions tagged [dask-delayed]

Dask.Delayed refers to the python interface that consists of the delayed function, which wraps a function or object to create Delayed proxies. Use this tag for questions related to the python interface.

Dask.Delayed refers to the python interface that consists of the delayed function, which wraps a function or object to create Delayed proxies. Use this tag for questions related to the python interface.

290 questions
19
votes
1 answer

How to use all the cpu cores using Dask?

I have a pandas series with more than 35000 rows. I want to use dask make it more efficient. However, I both the dask code and the pandas code are taking the same time. Initially "ser" is pandas series and fun1 and fun2 are basic functions…
ANKIT JHA
  • 359
  • 1
  • 3
  • 9
13
votes
1 answer

Understanding memory behavior of Dask distributed

Similar to this question, I'm running into memory issues with Dask distributed. However, in my case the explanation is not that the client is trying to collect a large amount of data. The problem can be illustrated based on a very simple task graph:…
bluenote10
  • 23,414
  • 14
  • 122
  • 178
12
votes
1 answer

Unpacking result of delayed function

While converting my program using delayed, I stumbled upon a commonly used programming pattern that doesn't work with delayed. Example: from dask import delayed @delayed def myFunction(): return 1,2 a, b = myFunction() a.compute() Raises:…
Henk
  • 145
  • 1
  • 6
11
votes
1 answer

Sorting in Dask

I want to find an alternative of pandas.dataframe.sort_value function in dask. I came through set_index, but it would sort on a single column. How can I sort multiple columns of Dask data frame?
Dhruv Kumar
  • 399
  • 2
  • 13
10
votes
1 answer

Dask delayed object of unspecified length not iterable error when combining dictionaries

I'm trying to construct a dictionary in parallel using dask, but I'm running into a TypeError: Delayed objects of unspecified length are not iterable. I'm trying to compute add, subtract, and multiply at the same time so the dictionary is…
blahblahblah
  • 2,299
  • 8
  • 45
  • 60
7
votes
1 answer

Retries in dask.compute() are unclear

From the documentation, Number of allowed automatic retries if computing a result fails. Does "result" refer to each individual task or the entire compute() call? If it refers to the entire call, how to implement retries for each task in…
Michał Zawadzki
  • 695
  • 6
  • 14
7
votes
1 answer

Dask For Loop In Parallel

I am trying to find the correct syntax for using a for loop with dask delayed. I have found several tutorials and other questions but none fit my condition, which is extremely basic. First, is this the correct way to run a for-loop in…
B_Miner
  • 1,840
  • 4
  • 31
  • 66
7
votes
2 answers

what is the default directory where dask workers store results or files.?

[mapr@impetus-i0057 latest_code_deepak]$ dask-worker 172.26.32.37:8786 distributed.nanny - INFO - Start Nanny at: 'tcp://172.26.32.36:50930' distributed.diskutils - WARNING - Found stale lock file and directory…
TheCodeCache
  • 820
  • 1
  • 7
  • 27
6
votes
0 answers

Huge memory use difference between dask and dask.distributed

I am trying to use dask.delayed to compute a large matrix for use in a later calculation. I am only ever running the code on a single local machine. When I use a dask single-machine scheduler it works fine, but is a little slow. To access more…
Nick W.
  • 61
  • 4
5
votes
1 answer

Apply a function over the columns of a Dask array

What is the most efficient way to apply a function to each column of a Dask array? As documented below, I've tried a number of things but I still suspect that my use of Dask is rather amateurish. I have a quite wide and quite long array, in the…
5
votes
1 answer

convert dask.bag of dictionaries to dask.dataframe using dask.delayed and pandas.DataFrame

I am struggling to convert a dask.bag of dictionaries into dask.delayed pandas.DataFrames into a final dask.dataframe I have one function (make_dict) that reads files into a rather complex nested dictionary structure and another function (make_df)…
CFabry
  • 53
  • 2
  • 5
5
votes
2 answers

using dask for scraping via requests

I like the simplicity of dask and would love to use it for scraping a local supermarket. My multiprocessing.cpu_count() is 4, but this code only achieves a 2x speedup. Why? from bs4 import BeautifulSoup import dask, requests, time import pandas as…
Sergio Lucero
  • 862
  • 1
  • 12
  • 21
5
votes
1 answer

Using dask as for task scheduling to run machine learning models in parallel

So basically what I want is to run ML Pipelines in parallel. I have been using scikit-learn, and I have decided to use DaskGridSearchCV. I have is a list of gridSearchCV = DaskGridSearchCV(pipeline, grid, scoring=evaluator) objects, and I run each…
Larissa Leite
  • 1,358
  • 3
  • 21
  • 36
4
votes
1 answer

Setting maximum number of workers in Dask map function

I have a Dask process that triggers 100 workers with a map function: worker_args = .... # array with 100 elements with worker parameters futures = client.map(function_in_worker, worker_args) worker_responses = client.gather(futures) I use docker…
ps0604
  • 1,227
  • 23
  • 133
  • 330
4
votes
1 answer

Dask multi-stage resource setup causes Failed to Serialize Error

Using the exact code from Dask's documentation at https://jobqueue.dask.org/en/latest/examples.html In case the page changes, this is the code: from dask_jobqueue import SLURMCluster from distributed import Client from dask import delayed cluster =…
michaelgbj
  • 290
  • 1
  • 10
1
2 3
19 20