How to convert a pandas dataframe from a string based categorical column to a numeric representation

Question

I have a column in a dataframe which looks like this:

df['label']

['some_label', 'some_label', 'a_diff_label', 'a_diff_label',...]

I want to convert it to something like this:

[1,1,0,0,...]

Does this answer your question? [Convert categorical data in pandas dataframe](https://stackoverflow.com/questions/32011359/convert-categorical-data-in-pandas-dataframe) — Rohit Nandi, May 29 '20 at 15:30

score 2 · Answer 1 · answered Jul 31 '18 at 03:59

2

There are lot of ways to achieve this (etc, factor)

pd.Series(['some_label', 'some_label', 'a_diff_label', 'a_diff_label']).astype('category').cat.codes
Out[19]: 
0    1
1    1
2    0
3    0
dtype: int8

answered Jul 31 '18 at 03:59

BENY

317,841
20
164
234

Could that be used to create a value for more than two labels. something like 'label_a' returns 0, 'label_b' returns 1, 'label_c' returns 2, etc? – netskink Jul 31 '18 at 04:15

score 1 · Accepted Answer · answered Jul 31 '18 at 04:03

1

You can also use LabelEncoder from sklearn which can also transform the label encoding back if needed. (sklearn LabelEncoder documentation):

import pandas as pd
from sklearn import preprocessing

df = pd.DataFrame({'label': ['some_label', 'some_label', 'a_diff_label', 'a_diff_label']})

le = preprocessing.LabelEncoder()
df['label'] = le.fit_transform(df['label'])

answered Jul 31 '18 at 04:03

niraj

17,498
4
33
48

That is cool. You can also map more than one label. thank you. – netskink Jul 31 '18 at 04:45
`Happy Coding.` – niraj Jul 31 '18 at 04:46
1

I wish I could up vote this again. I was just using this on a multi-class label and I like how you don't have to type out the labels. The labels value encodings are determined from existing labels. – netskink Aug 14 '18 at 13:53

score 1 · Answer 3 · answered Jul 31 '18 at 04:43

I know it's just been answered already, but you might want to use a map from code to label and viceversa, with a couple of transforming functions. Like this:

import pandas as pd

col_map = pd.DataFrame.from_dict({
    'some_label': 0,
    'a_diff_label': 1,
}, orient='index')

def label_to_code(label):
    return col_map[col_map.index == label][0].values[0]

def code_to_label(code):
    return col_map[col_map[0] == code].index[0]

df = pd.DataFrame(data={'label': ['some_label', 'some_label', 'a_diff_label', 'a_diff_label']})
df['code'] = df['label'].apply(label_to_code)
df['another_label'] = df['code'].apply(code_to_label)
print(df)

score 0 · Answer 4 · answered Jul 31 '18 at 03:57

0

Since the similar question I found was very complex and hard to understand, I am posting a simple answer.

Just do this:

df['label'] = (df['label'] == 'some_label').astype(int)

answered Jul 31 '18 at 03:57

netskink

4,033
2
34
46

How to convert a pandas dataframe from a string based categorical column to a numeric representation

4 Answers4