1

I have a column in a dataframe which looks like this:

df['label']

['some_label', 'some_label', 'a_diff_label', 'a_diff_label',...]

I want to convert it to something like this:

[1,1,0,0,...]
netskink
  • 4,033
  • 2
  • 34
  • 46
  • Does this answer your question? [Convert categorical data in pandas dataframe](https://stackoverflow.com/questions/32011359/convert-categorical-data-in-pandas-dataframe) – Rohit Nandi May 29 '20 at 15:30

4 Answers4

2

There are lot of ways to achieve this (etc, factor)

pd.Series(['some_label', 'some_label', 'a_diff_label', 'a_diff_label']).astype('category').cat.codes
Out[19]: 
0    1
1    1
2    0
3    0
dtype: int8
BENY
  • 317,841
  • 20
  • 164
  • 234
  • Could that be used to create a value for more than two labels. something like 'label_a' returns 0, 'label_b' returns 1, 'label_c' returns 2, etc? – netskink Jul 31 '18 at 04:15
1

You can also use LabelEncoder from sklearn which can also transform the label encoding back if needed. (sklearn LabelEncoder documentation):

import pandas as pd
from sklearn import preprocessing

df = pd.DataFrame({'label': ['some_label', 'some_label', 'a_diff_label', 'a_diff_label']})

le = preprocessing.LabelEncoder()
df['label'] = le.fit_transform(df['label'])
niraj
  • 17,498
  • 4
  • 33
  • 48
1

I know it's just been answered already, but you might want to use a map from code to label and viceversa, with a couple of transforming functions. Like this:

import pandas as pd

col_map = pd.DataFrame.from_dict({
    'some_label': 0,
    'a_diff_label': 1,
}, orient='index')

def label_to_code(label):
    return col_map[col_map.index == label][0].values[0]

def code_to_label(code):
    return col_map[col_map[0] == code].index[0]

df = pd.DataFrame(data={'label': ['some_label', 'some_label', 'a_diff_label', 'a_diff_label']})
df['code'] = df['label'].apply(label_to_code)
df['another_label'] = df['code'].apply(code_to_label)
print(df)
JAponte
  • 1,508
  • 1
  • 13
  • 21
0

Since the similar question I found was very complex and hard to understand, I am posting a simple answer.

Just do this:

df['label'] = (df['label'] == 'some_label').astype(int)
netskink
  • 4,033
  • 2
  • 34
  • 46