I have a data set like this:
Entity Year Mean
0 Afghanistan 2016 0.99
1 Africa 2016 0.99
2 Albania 2016 0.99
3 Algeria 2016 0.99
4 Americas 2016 0.99
... ... ... ...
11346 World 1961 0.05
11347 Yemen 1961 0.05
11348 Yugoslavia 1961 0.05
11349 Zambia 1961 0.05
11350 Zimbabwe 1961 0.05
and I need to encode Entity column in this data set. I used OneHotEncoder
in sklearn
. Here is my code:
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import OneHotEncoder
ct = ColumnTransformer(transformers=[('encoder', OneHotEncoder(), [0])], remainder='passthrough')
x_yam = np.array(ct.fit_transform(x_yam))
But after encoding it gives me something like this:
(0, 0) 1.0
(0, 229) 2016.0
(0, 230) 0.99
(1, 1) 1.0
(1, 229) 2016.0
(1, 230) 0.99
(2, 2) 1.0
(2, 229) 2016.0
(2, 230) 0.99
(3, 3) 1.0
(3, 229) 2016.0
(3, 230) 0.99
(4, 4) 1.0
(4, 229) 2016.0
(4, 230) 0.99
(5, 5) 1.0
(5, 229) 2016.0
(5, 230) 0.99
(6, 6) 1.0
(6, 229) 2016.0
(6, 230) 0.99
(7, 7) 1.0
(7, 229) 2016.0
(7, 230) 0.99
(8, 8) 1.0
: :
I can't use this data for my ML model, so how can I use OneHotEncoder
correctly to encode my data?