Questions tagged [dask-ml]

79 questions
6
votes
2 answers

Running two dask-ml imputers simultaneously instead of sequentially

I can impute the mean and most frequent value using dask-ml like so, this works fine: mean_imputer = impute.SimpleImputer(strategy='mean') most_frequent_imputer = impute.SimpleImputer(strategy='most_frequent') data = [[100, 2, 5], [np.nan, np.nan,…
ps0604
  • 1,227
  • 23
  • 133
  • 330
6
votes
2 answers

Dask distributed.scheduler - ERROR - Couldn't gather keys

import joblib from sklearn.externals.joblib import parallel_backend with joblib.parallel_backend('dask'): from dask_ml.model_selection import GridSearchCV import xgboost from xgboost import XGBRegressor grid_search =…
praveen pravii
  • 193
  • 2
  • 9
5
votes
2 answers

Why Dask is not respecting the memory limits for LocalCluster?

I'm running the code pasted below in a machine with 16GB of RAM (purposely). import dask.array as da import dask.delayed from sklearn.datasets import make_blobs import numpy as np from dask_ml.cluster import KMeans from dask.distributed import…
jcfaracco
  • 853
  • 2
  • 6
  • 21
4
votes
2 answers

How To Do Model Predict Using Distributed Dask With a Pre-Trained Keras Model?

I am loading my pre-trained keras model and then trying to parallelize a large number of input data using dask? Unfortunately, I'm running into some issues with this relating to how I'm creating my dask array. Any guidance would be greatly…
Riley Hun
  • 2,541
  • 5
  • 31
  • 77
3
votes
0 answers

dask-ml LinearRegression on multidimensional dask arrays

I am trying to perform multivariate linear regression on array data that is larger than memory. I am wondering how I should iterate a dask_ml linear regression function on a multidimensional dask array. On small enough data, I can use…
TomNorway
  • 2,584
  • 1
  • 19
  • 26
3
votes
2 answers

Does HyperbandCV and other incremental search algorithms work for models without partial_fit and fir pipelines?

I have been deep diving on the github pages and reading the documentation, but I am not fully understanding whether HyperbandCV will be useful to speed up hyperparameter optimization in my case. I am using SKLearn's pipeline functionality. And I am…
Ife A
  • 43
  • 4
3
votes
0 answers

Dask hangs when using dask_xgboost train method

I am trying to reproduce the dask xgboost example from the dask-ml docs at http://ml.dask.org/examples/xgboost.html. Unfortunately, Dask doesn't seem to complete the training and I'm having a hard time tracking down the meaning of the errors and…
chicagoson
  • 31
  • 1
2
votes
0 answers

Using Dask to Chunk Large Dataset

I am working now on a large dataset of images of a shape (10000000,1,32,32), where the format goes (instances, channel, height, width). I was able to load the data and turn it into chunk sizes but my concern now lies on how to train my CNN model…
Newbie
  • 31
  • 3
2
votes
1 answer

How do I submit a class to a Dask-Cluster?

I might misunderstand how Dasks submit() function is working. If I'm submitting a function of my class that is initializing a parameter it is not working. Question: What is the correct way to submit a class to a dask-cluster using .submit()? So, I…
2
votes
1 answer

dask_xgboost.predict works but cannot be shown -Data must be 1-dimensional

I am trying to create model using XGBoost. It seems like I manage to train the model, however, when I try to predict my test data and to see the actual prediction, I get the following error: ValueError: Data must be 1-dimensional This is how I…
Reut
  • 1,555
  • 4
  • 23
  • 55
2
votes
3 answers

Dask switch between cluster or changing cluster context

I am new to Dask, so kindly forgive me if this question seems silly to you. In Dask, I am working with a Dask dataframe with around 50GB of data. This data is string data that I need to preprocess (fast with the process) before giving it to the…
Vivek kala
  • 23
  • 3
2
votes
1 answer

How do you integrate GPU support with Dask Gateway?

We are currently using Dask Gateway with CPU-only workers. However, down the road when deep learning becomes more widely adopted, we want to transition into adding GPU support for the clusters created through Dask Gateway. I've checked the Dask…
Riley Hun
  • 2,541
  • 5
  • 31
  • 77
2
votes
2 answers

ModuleNotFoundError: No module named 'dask_xgboost'

I am trying to run dask_ml functions but the system does not accept my installation and gives and error when I import it. OS: Linux ubuntu 20. Installation to conda environment conda install -c conda-forge dask-ml Import #dask from dask_ml.xgboost…
sogu
  • 2,738
  • 5
  • 31
  • 90
2
votes
0 answers

Dask One Hot Encoder handle_unknown="ignore", work around?

I understand this it not handled right now but it's preventing me from being able to encode features in a real time fashion (like in a live API service) against a trained OneHotEncoder / Pipeline. How do people work around needing to encode data in…
svmath123
  • 21
  • 2
2
votes
1 answer

atribute error using dask ml StandardScaler

I'm trying to reproduce the example on the dask-ml documentation: https://dask-ml.readthedocs.io/en/latest/modules/api.html that for some reason is made with sklearn: from sklearn.preprocessing import StandardScaler data = [[0, 0], [0, 0], [1, 1],…
Luis Ramon Ramirez Rodriguez
  • 9,591
  • 27
  • 102
  • 181
1
2 3 4 5 6