Questions tagged [dask-ml]
79 questions
6
votes
2 answers
Running two dask-ml imputers simultaneously instead of sequentially
I can impute the mean and most frequent value using dask-ml like so, this works fine:
mean_imputer = impute.SimpleImputer(strategy='mean')
most_frequent_imputer = impute.SimpleImputer(strategy='most_frequent')
data = [[100, 2, 5], [np.nan, np.nan,…

ps0604
- 1,227
- 23
- 133
- 330
6
votes
2 answers
Dask distributed.scheduler - ERROR - Couldn't gather keys
import joblib
from sklearn.externals.joblib import parallel_backend
with joblib.parallel_backend('dask'):
from dask_ml.model_selection import GridSearchCV
import xgboost
from xgboost import XGBRegressor
grid_search =…

praveen pravii
- 193
- 2
- 9
5
votes
2 answers
Why Dask is not respecting the memory limits for LocalCluster?
I'm running the code pasted below in a machine with 16GB of RAM (purposely).
import dask.array as da
import dask.delayed
from sklearn.datasets import make_blobs
import numpy as np
from dask_ml.cluster import KMeans
from dask.distributed import…

jcfaracco
- 853
- 2
- 6
- 21
4
votes
2 answers
How To Do Model Predict Using Distributed Dask With a Pre-Trained Keras Model?
I am loading my pre-trained keras model and then trying to parallelize a large number of input data using dask? Unfortunately, I'm running into some issues with this relating to how I'm creating my dask array. Any guidance would be greatly…

Riley Hun
- 2,541
- 5
- 31
- 77
3
votes
0 answers
dask-ml LinearRegression on multidimensional dask arrays
I am trying to perform multivariate linear regression on array data that is larger than memory. I am wondering how I should iterate a dask_ml linear regression function on a multidimensional dask array.
On small enough data, I can use…

TomNorway
- 2,584
- 1
- 19
- 26
3
votes
2 answers
Does HyperbandCV and other incremental search algorithms work for models without partial_fit and fir pipelines?
I have been deep diving on the github pages and reading the documentation, but I am not fully understanding whether HyperbandCV will be useful to speed up hyperparameter optimization in my case.
I am using SKLearn's pipeline functionality. And I am…

Ife A
- 43
- 4
3
votes
0 answers
Dask hangs when using dask_xgboost train method
I am trying to reproduce the dask xgboost example from the dask-ml docs at http://ml.dask.org/examples/xgboost.html. Unfortunately, Dask doesn't seem to complete the training and I'm having a hard time tracking down the meaning of the errors and…

chicagoson
- 31
- 1
2
votes
0 answers
Using Dask to Chunk Large Dataset
I am working now on a large dataset of images of a shape (10000000,1,32,32), where the format goes (instances, channel, height, width). I was able to load the data and turn it into chunk sizes but my concern now lies on how to train my CNN model…

Newbie
- 31
- 3
2
votes
1 answer
How do I submit a class to a Dask-Cluster?
I might misunderstand how Dasks submit() function is working. If I'm submitting a function of my class that is initializing a parameter it is not working.
Question: What is the correct way to submit a class to a dask-cluster using .submit()?
So, I…

Christine
- 53
- 8
2
votes
1 answer
dask_xgboost.predict works but cannot be shown -Data must be 1-dimensional
I am trying to create model using XGBoost.
It seems like I manage to train the model, however, when I try to predict my test data and to see the actual prediction, I get the following error:
ValueError: Data must be 1-dimensional
This is how I…

Reut
- 1,555
- 4
- 23
- 55
2
votes
3 answers
Dask switch between cluster or changing cluster context
I am new to Dask, so kindly forgive me if this question seems silly to you. In Dask, I am working with a Dask dataframe with around 50GB of data. This data is string data that I need to preprocess (fast with the process) before giving it to the…

Vivek kala
- 23
- 3
2
votes
1 answer
How do you integrate GPU support with Dask Gateway?
We are currently using Dask Gateway with CPU-only workers. However, down the road when deep learning becomes more widely adopted, we want to transition into adding GPU support for the clusters created through Dask Gateway.
I've checked the Dask…

Riley Hun
- 2,541
- 5
- 31
- 77
2
votes
2 answers
ModuleNotFoundError: No module named 'dask_xgboost'
I am trying to run dask_ml functions but the system does not accept my installation and gives and error when I import it. OS: Linux ubuntu 20.
Installation to conda environment
conda install -c conda-forge dask-ml
Import
#dask
from dask_ml.xgboost…

sogu
- 2,738
- 5
- 31
- 90
2
votes
0 answers
Dask One Hot Encoder handle_unknown="ignore", work around?
I understand this it not handled right now but it's preventing me from being able to encode features in a real time fashion (like in a live API service) against a trained OneHotEncoder / Pipeline.
How do people work around needing to encode data in…

svmath123
- 21
- 2
2
votes
1 answer
atribute error using dask ml StandardScaler
I'm trying to reproduce the example on the dask-ml documentation: https://dask-ml.readthedocs.io/en/latest/modules/api.html that for some reason is made with sklearn:
from sklearn.preprocessing import StandardScaler
data = [[0, 0], [0, 0], [1, 1],…

Luis Ramon Ramirez Rodriguez
- 9,591
- 27
- 102
- 181