1

I have a data set like this:

           Entity  Year  Mean
0      Afghanistan  2016  0.99
1           Africa  2016  0.99
2          Albania  2016  0.99
3          Algeria  2016  0.99
4         Americas  2016  0.99
...            ...   ...   ...
11346        World  1961  0.05
11347        Yemen  1961  0.05
11348   Yugoslavia  1961  0.05
11349       Zambia  1961  0.05
11350     Zimbabwe  1961  0.05

and I need to encode Entity column in this data set. I used OneHotEncoder in sklearn. Here is my code:

from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import OneHotEncoder
ct = ColumnTransformer(transformers=[('encoder', OneHotEncoder(), [0])], remainder='passthrough')
x_yam = np.array(ct.fit_transform(x_yam))

But after encoding it gives me something like this:

  (0, 0)    1.0
  (0, 229)  2016.0
  (0, 230)  0.99
  (1, 1)    1.0
  (1, 229)  2016.0
  (1, 230)  0.99
  (2, 2)    1.0
  (2, 229)  2016.0
  (2, 230)  0.99
  (3, 3)    1.0
  (3, 229)  2016.0
  (3, 230)  0.99
  (4, 4)    1.0
  (4, 229)  2016.0
  (4, 230)  0.99
  (5, 5)    1.0
  (5, 229)  2016.0
  (5, 230)  0.99
  (6, 6)    1.0
  (6, 229)  2016.0
  (6, 230)  0.99
  (7, 7)    1.0
  (7, 229)  2016.0
  (7, 230)  0.99
  (8, 8)    1.0
  : :

I can't use this data for my ML model, so how can I use OneHotEncoder correctly to encode my data?

Ben Reiniger
  • 10,517
  • 3
  • 16
  • 29
Umut K.
  • 91
  • 1
  • 1
  • 8

1 Answers1

2

The column transformer has opted to transform into a scipy sparse matrix because the one-hot encoder does and it has sufficiently many columns compared to the passthrough.

Many ML models will accept sparse input, and this will be much more memory-efficient.

Otherwise, you can force dense arrays throughout by specifying sparse_threshold=0.0 in the ColumnTransformer, or sparse=False in the OneHotEncoder. Or you can cast the sparse output to dense after transforming; you cannot do that with the np.array(...) you've tried, but using .todense() instead will work (see https://stackoverflow.com/a/55639087/10495893).

Ben Reiniger
  • 10,517
  • 3
  • 16
  • 29