I need to transform the independent field from string to arithmetical notation. I am using OneHotEncoder for the transformation. My dataset has many independent columns of which some are as:
Country | Age
--------------------------
Germany | 23
Spain | 25
Germany | 24
Italy | 30
I have to encode the Country column like
0 | 1 | 2 | 3
--------------------------------------
1 | 0 | 0 | 23
0 | 1 | 0 | 25
1 | 0 | 0 | 24
0 | 0 | 1 | 30
I succeed to get the desire transformation via using OneHotEncoder as
#Encoding the categorical data
from sklearn.preprocessing import LabelEncoder
labelencoder_X = LabelEncoder()
X[:,0] = labelencoder_X.fit_transform(X[:,0])
#we are dummy encoding as the machine learning algorithms will be
#confused with the values like Spain > Germany > France
from sklearn.preprocessing import OneHotEncoder
onehotencoder = OneHotEncoder(categorical_features=[0])
X = onehotencoder.fit_transform(X).toarray()
Now I'm getting the depreciation message to use categories='auto'
. If I do so the transformation is being done for the all independent columns like country, age, salary etc.
How to achieve the transformation on the dataset 0th column only?