I have a column in a dataframe which looks like this:
df['label']
['some_label', 'some_label', 'a_diff_label', 'a_diff_label',...]
I want to convert it to something like this:
[1,1,0,0,...]
I have a column in a dataframe which looks like this:
df['label']
['some_label', 'some_label', 'a_diff_label', 'a_diff_label',...]
I want to convert it to something like this:
[1,1,0,0,...]
There are lot of ways to achieve this (etc, factor)
pd.Series(['some_label', 'some_label', 'a_diff_label', 'a_diff_label']).astype('category').cat.codes
Out[19]:
0 1
1 1
2 0
3 0
dtype: int8
You can also use LabelEncoder
from sklearn
which can also transform the label encoding back if needed. (sklearn LabelEncoder documentation):
import pandas as pd
from sklearn import preprocessing
df = pd.DataFrame({'label': ['some_label', 'some_label', 'a_diff_label', 'a_diff_label']})
le = preprocessing.LabelEncoder()
df['label'] = le.fit_transform(df['label'])
I know it's just been answered already, but you might want to use a map from code to label and viceversa, with a couple of transforming functions. Like this:
import pandas as pd
col_map = pd.DataFrame.from_dict({
'some_label': 0,
'a_diff_label': 1,
}, orient='index')
def label_to_code(label):
return col_map[col_map.index == label][0].values[0]
def code_to_label(code):
return col_map[col_map[0] == code].index[0]
df = pd.DataFrame(data={'label': ['some_label', 'some_label', 'a_diff_label', 'a_diff_label']})
df['code'] = df['label'].apply(label_to_code)
df['another_label'] = df['code'].apply(code_to_label)
print(df)
Since the similar question I found was very complex and hard to understand, I am posting a simple answer.
Just do this:
df['label'] = (df['label'] == 'some_label').astype(int)