2

I'm trying to reproduce the example on the dask-ml documentation: https://dask-ml.readthedocs.io/en/latest/modules/api.html that for some reason is made with sklearn:

from sklearn.preprocessing import StandardScaler
data = [[0, 0], [0, 0], [1, 1], [1, 1]]
scaler = StandardScaler()
print(scaler.fit(data))
StandardScaler(copy=True, with_mean=True, with_std=True)
print(scaler.mean_)

This is the code I'm using for dask:

from dask_ml.preprocessing import StandardScaler
data = [[0, 0], [0, 0], [1, 1], [1, 1]]
scaler = StandardScaler()
print(scaler.fit(data))
StandardScaler(copy=True, with_mean=True, with_std=True)

Which raises the following error:

AttributeError: 'list' object has no attribute 'mean'

Then I tried with an example from this medium post: https://towardsdatascience.com/speeding-up-your-algorithms-part-4-dask-7c6ed79994ef

df = dd.read_csv("test.csv",assume_missing=True)
sc = StandardScaler()
df["MSSubClass"] = sc.fit_transform(df["MSSubClass"])

Which raises this error:

AttributeError: 'Scalar' object has no attribute 'copy'

Luis Ramon Ramirez Rodriguez
  • 9,591
  • 27
  • 102
  • 181

1 Answers1

2

The problem with the example is that the data is not of the correct type. Converting to a numpy array and casting to a float eliminates two errors. Interestingly, the transform step works despite the fact that the data are a list of integers.

import numpy as np
from dask_ml.preprocessing import StandardScaler

data = np.array([[0, 0], [0, 0], [1, 1], [1, 1]]).astype('float')

scaler = StandardScaler()

print(scaler.fit(data))
print(scaler.mean_)
print(scaler.transform(data))
print(scaler.transform([[2, 2]]))
KRKirov
  • 3,854
  • 2
  • 16
  • 20