OneHotEncoder categorical_features deprecated, how to transform specific column

Question

I need to transform the independent field from string to arithmetical notation. I am using OneHotEncoder for the transformation. My dataset has many independent columns of which some are as:

Country     |    Age       
--------------------------
Germany     |    23
Spain       |    25
Germany     |    24
Italy       |    30

I have to encode the Country column like

0     |    1     |     2     |       3
--------------------------------------
1     |    0     |     0     |      23
0     |    1     |     0     |      25
1     |    0     |     0     |      24 
0     |    0     |     1     |      30

I succeed to get the desire transformation via using OneHotEncoder as

#Encoding the categorical data
from sklearn.preprocessing import LabelEncoder

labelencoder_X = LabelEncoder()
X[:,0] = labelencoder_X.fit_transform(X[:,0])

#we are dummy encoding as the machine learning algorithms will be
#confused with the values like Spain > Germany > France
from sklearn.preprocessing import OneHotEncoder

onehotencoder = OneHotEncoder(categorical_features=[0])
X = onehotencoder.fit_transform(X).toarray()

Now I'm getting the depreciation message to use categories='auto'. If I do so the transformation is being done for the all independent columns like country, age, salary etc.

How to achieve the transformation on the dataset 0th column only?

it's a warning as in 0.22 these properties will not be available — Hassaan, Jan 24 '19 at 11:42
Okay, have you tried using a list of lists/arrays of values, `categories[i]` perhaps? — DirtyBit, Jan 24 '19 at 11:46

score 25 · Accepted Answer · edited May 25 '20 at 08:18

There is actually 2 warnings :

FutureWarning: The handling of integer data will change in version 0.22. Currently, the categories are determined based on the range [0, max(values)], while in the future they will be determined based on the unique values. If you want the future behaviour and silence this warning, you can specify "categories='auto'". In case you used a LabelEncoder before this OneHotEncoder to convert the categories to integers, then you can now use the OneHotEncoder directly.

and the second :

The 'categorical_features' keyword is deprecated in version 0.20 and will be removed in 0.22. You can use the ColumnTransformer instead.
"use the ColumnTransformer instead.", DeprecationWarning)

In the future, you should not define the columns in the OneHotEncoder directly, unless you want to use "categories='auto'". The first message also tells you to use OneHotEncoder directly, without the LabelEncoder first. Finally, the second message tells you to use ColumnTransformer, which is like a Pipe for columns transformations.

Here is the equivalent code for your case :

from sklearn.compose import ColumnTransformer 
ct = ColumnTransformer([("Name_Of_Your_Step", OneHotEncoder(),[0])], remainder="passthrough")) # The last arg ([0]) is the list of columns you want to transform in this step
ct.fit_transform(X)

See also : ColumnTransformer documentation

For the above example;

Encoding Categorical data (Basically Changing Text to Numerical data i.e, Country Name)

from sklearn.preprocessing import LabelEncoder, OneHotEncoder
from sklearn.compose import ColumnTransformer
#Encode Country Column
labelencoder_X = LabelEncoder()
X[:,0] = labelencoder_X.fit_transform(X[:,0])
ct = ColumnTransformer([("Country", OneHotEncoder(), [0])], remainder = 'passthrough')
X = ct.fit_transform(X)

i assigned the X = ct.fit_transform(X) and it has transformed the country column but it removed the age column completely. How do I get the both? transform result + the age column data — Hassaan, Jan 25 '19 at 12:10
I made the correction, you have the `remainder` argument to determine what to do with unmodified columns — CoMartel, Jan 25 '19 at 12:41
okay, the only problem I'm facing right now is ct.fit_transform(X) is returning 'ndarry object of numpy module' which is not supported by the array editor. It is because it is adding dtype='object' in the array. so to overcome this issue if have converted the type of the whole matrix to float. Is it right way? — Hassaan, Jan 25 '19 at 13:41
Just a question because the documentation also didn't clear it for me... What is the purpose of "Name"? — Shravya Boggarapu, Jun 02 '19 at 11:13
`Name`is just the name of the step. You can name it as you want, and it can be useful to call this step in the future, for example if you just need to set/get the parameter of one step — CoMartel, Jun 03 '19 at 07:04
user remainder='passthrough' as mentioned in documentation like below. transformer = ColumnTransformer( transformers=[ ("Country", # Just a name OneHotEncoder(), # The transformer class [0] # The column(s) to be applied on. ) ], remainder='passthrough' ) — Swarit Agarwal, Sep 04 '19 at 10:42

score 6 · Answer 2 · answered Dec 11 '19 at 14:59

As of version 0.22, you can write the same code as below:

from sklearn.preprocessing import OneHotEncoder
from sklearn.compose import ColumnTransformer
ct = ColumnTransformer([("Country", OneHotEncoder(), [0])], remainder = 'passthrough')
X = ct.fit_transform(X)

As you can see, you don't need to use LabelEncoder anymore.

score 5 · Answer 3 · answered Sep 04 '19 at 10:43

transformer = ColumnTransformer(
    transformers=[
        ("Country",        # Just a name
         OneHotEncoder(), # The transformer class
         [0]            # The column(s) to be applied on.
         )
    ], remainder='passthrough'
)
X = transformer.fit_transform(X)

Reminder will keep previous data while [0]th column will replace will be encoded

score 5 · Answer 4 · answered Dec 14 '19 at 14:10

Dont use the labelencoder and directly use OneHotEncoder.

from sklearn.preprocessing import OneHotEncoder
from sklearn.compose import make_column_transformer
A = make_column_transformer(
    (OneHotEncoder(categories='auto'), [0]), 
    remainder="passthrough")

x=A.fit_transform(x)

score 4 · Answer 5 · answered Jan 25 '19 at 16:26

4

There is a way that you can do one hot encoding with pandas. Python:

import pandas as pd
ohe=pd.get_dummies(dataframe_name['column_name'])

Give names to the newly formed columns add it to your dataframe. Check the pandas documentation here.

answered Jan 25 '19 at 16:26

Veera Srikanth

446
4
7

This is what I used with one more parameter to get rid of dummy trap: drop_first=True – Ali Aug 21 '19 at 00:40

score 1 · Answer 6 · edited Nov 28 '19 at 23:43

1

I had the same issue and the following worked for me:

OneHotEncoder(categories='auto', sparse=False)

Hope this helps

edited Nov 28 '19 at 23:43

Davide Fiocco

5,350
5
35
72

answered Nov 28 '19 at 22:39

user1970528

39
1

score 1 · Answer 7 · edited Dec 31 '19 at 13:22

1

Use the following code :-

from sklearn.preprocessing import OneHotEncoder

from sklearn.compose import ColumnTransformer

columnTransformer = ColumnTransformer([('encoder', OneHotEncoder(), [0])], remainder='passthrough')

X = np.array(columnTransformer.fit_transform(X), dtype = np.str)

print(X)

edited Dec 31 '19 at 13:22

ChrisMM

8,448
13
29
48

answered Dec 31 '19 at 12:42

Abhishek Chatterjee

21
2

score 1 · Answer 8 · answered Jan 01 '21 at 06:17

from sklearn.preprocessing import OneHotEncoder
from sklearn.compose import ColumnTransformer
...
onehotencorder = ColumnTransformer(
   [('one_hot_encoder', OneHotEncoder(), [0])],
   remainder='passthrough'                     
)

X = onehotencorder.fit_transform(X)

score 0 · Answer 9 · answered Feb 05 '21 at 03:31

# Data Preprocessing Template

# Importing the libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

# Importing the dataset
dataset = pd.read_csv('Data.csv')
X = dataset.iloc[:,:-1].values
y = dataset.iloc[:,3].values

# Splitting the dataset into the Training set and Test set
#from sklearn.preprocessing import Imputer
from sklearn.impute import SimpleImputer
imputer = SimpleImputer(missing_values=np.nan, strategy='mean')
imputer = imputer.fit(X[:,1:3])
X[:,1:3] = imputer.transform(X[:,1:3])

#encoding Categorical Data
from sklearn.preprocessing import LabelEncoder
from sklearn.preprocessing import OneHotEncoder
from sklearn.compose import ColumnTransformer

labelencoder_X = LabelEncoder()
X[:,0] = labelencoder_X.fit_transform(X[:,0])
onehotencoder = ColumnTransformer([("Country", OneHotEncoder(), [0])], remainder = "passthrough")
X = onehotencoder.fit_transform(X)


labelencoder_y = LabelEncoder()
y = labelencoder_y.fit_transform(y)

While this code may answer the question, providing additional context regarding why and/or how this code answers the question improves its long-term value. — dan1st, Feb 05 '21 at 11:38

score 0 · Answer 10 · edited Jun 26 '21 at 05:13

0

from sklearn.preprocessing import OneHotEncoder
from sklearn.compose import ColumnTransformer
transformer = ColumnTransformer([('one_hot_encoder', OneHotEncoder(), [0])],remainder='passthrough')
x = py.array(transformer.fit_transform(x), dtype=py.float)


onehotencoder = oneHotEncoder(categorical_features=[0])

This code should solve the error.

edited Jun 26 '21 at 05:13

DaveL17

1,673
7
24
38

answered Jun 25 '21 at 20:33

Siddhesh Kale

1

score 0 · Answer 11 · answered Jul 14 '21 at 15:17

When updating the code from this:

one_hot_encoder = OneHotEncoder(categorical_features = [0, 1, 4, 5, 6])
X_train = one_hot_encoder.fit_transform(X_train).toarray()

To this:

ct = ColumnTransformer([('one_hot_encoder', OneHotEncoder(), [
                       0, 1, 4, 5, 6])], remainder='passthrough')
X_train = np.array(ct.fit_transform(X_train), dtype=np.float)

Note that I had to add dtype=np.float to fix the error message TypeError: can't convert np.ndarray of type numpy.object_.

Where my colums were [0, 1, 4, 5, 6] and 'one_hot_encoder' is anything.

My imports were:

from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import OneHotEncoder
import numpy as np

score 0 · Answer 12 · answered Aug 10 '23 at 16:43

I had a similar challenge because the categorical_feature attribute is depreciated. The sure way is to use 'ColumnTransformer'. This is my code below:

import numpy as np
import pandas as pd
from sklearn.preprocessing import LabelEncoder, OneHotEncoder
from sklearn.compose import ColumnTransformer

companies = pd.read_csv(r'E:\SimpleLearn ML\1000_Companies.csv')
X = companies.iloc[:, :-1].values
y = companies.iloc[:, 4].values
companies.head()

labelencoder = LabelEncoder()
X[:, 3] = labelencoder.fit_transform(X[:,3])

onehotencoder = ColumnTransformer([("State", OneHotEncoder(), [3])], remainder = "passthrough")
X = onehotencoder.fit_transform(X)

labelencoder_y = LabelEncoder()
y = labelencoder_y.fit_transform(y)

OneHotEncoder categorical_features deprecated, how to transform specific column

12 Answers12

Linked