LabelEncoding to multiple columns in pandas

Question

I'm currently working on Titanic dataset. It consists of 4-5 non numeric columns. I want to apply sklearn.LabelEncoder class to get encoded values for these non-numeric columns. I can, no doubt, apply this method one by one to each column. But the job will become more tedious when there're more than 20-30 such columns. Since I know the name of such non-numeric columns, is there any sophisticated way to do so in ease manner?

Did you want `fillna` https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.fillna.html — Frank AK, Dec 17 '17 at 03:42
No!!! I've mentioned sklearn.LabelEncoder term in the question itself — Nuance, Dec 17 '17 at 04:27
I suggest using pandas' `get_dummies` with your list of columns to encode, like here: https://stackoverflow.com/a/43971156/1870832 — Max Power, Dec 18 '17 at 15:42

score -1 · Answer 1 · answered Dec 17 '17 at 08:28

-1

Just run a loop after selecting object types

obj_cols = df.select_dtypes(include=[object])

for i in obj_cols:
    df[i+'label'] = le.fit_transform(df[i])

answered Dec 17 '17 at 08:28

Abhishek Sharma

1,909
2
15
24

Using a single labelencoder object `le` will be problematic when using on train and test data. – Vivek Kumar Dec 17 '17 at 10:25
It's always advisable to combine train and test data before performing label encoding. If you run label encoder separately you always run in the risk of having new categories in test data – Abhishek Sharma Dec 17 '17 at 16:26
I wouldn't say "combine train and test data before..." for anything, because the point of "test" is to simulate new data you get in production, and you don't know in advance what that will come in like – Max Power Dec 18 '17 at 16:22

LabelEncoding to multiple columns in pandas

1 Answers1