I'm currently working on Titanic dataset. It consists of 4-5 non numeric columns. I want to apply sklearn.LabelEncoder class to get encoded values for these non-numeric columns. I can, no doubt, apply this method one by one to each column. But the job will become more tedious when there're more than 20-30 such columns. Since I know the name of such non-numeric columns, is there any sophisticated way to do so in ease manner?
Asked
Active
Viewed 829 times
0
-
Did you want `fillna` https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.fillna.html – Frank AK Dec 17 '17 at 03:42
-
No!!! I've mentioned sklearn.LabelEncoder term in the question itself – Nuance Dec 17 '17 at 04:27
-
why not just use a for loop with names and labelencoder. – Vivek Kumar Dec 17 '17 at 04:47
-
1I suggest using pandas' `get_dummies` with your list of columns to encode, like here: https://stackoverflow.com/a/43971156/1870832 – Max Power Dec 18 '17 at 15:42
1 Answers
-1
Just run a loop after selecting object types
obj_cols = df.select_dtypes(include=[object])
for i in obj_cols:
df[i+'label'] = le.fit_transform(df[i])

Abhishek Sharma
- 1,909
- 2
- 15
- 24
-
Using a single labelencoder object `le` will be problematic when using on train and test data. – Vivek Kumar Dec 17 '17 at 10:25
-
It's always advisable to combine train and test data before performing label encoding. If you run label encoder separately you always run in the risk of having new categories in test data – Abhishek Sharma Dec 17 '17 at 16:26
-
I wouldn't say "combine train and test data before..." for anything, because the point of "test" is to simulate new data you get in production, and you don't know in advance what that will come in like – Max Power Dec 18 '17 at 16:22