TLDR: What's the most concise way to encode ordered categories to numeric w/ a particular encoding conversion? (i.e. one that preserves the ordered nature of the categories).
["Weak","Normal","Strong"] --> [0,1,2]
Assuming I have an ordered categorical variable like similar to the example from here:
import pandas as pd
raw_data = {'patient': [1, 1, 1, 2, 2],
'obs': [1, 2, 3, 1, 2],
'treatment': [0, 1, 0, 1, 0],
'score': ['strong', 'weak', 'normal', 'weak', 'strong']}
df = pd.DataFrame(raw_data, columns = ['patient', 'obs', 'treatment', 'score'])
df
obs treatment score
0 1 strong
1 1 weak
2 1 normal
3 2 weak
4 2 strong
I can create a function and apply it across my dataframe to get the desired conversation:
def score_to_numeric(x):
if x=='strong':
return 3
if x=='normal':
return 2
if x=='weak':
return 1
df['score_num'] = df['score'].apply(score_to_numeric)
df
obs treatment score score_num
0 1 strong 3
1 1 weak 1
2 1 normal 2
3 2 weak 1
4 2 strong 3
My question: Is there any way I can do this inline? (w/o having to specific a separate "score_to_numeric" function.
Maybe using some kind of lambda or replace functionality? Alternatively, this SO article suggests that Sklearn's LabelEncoder() is pretty powerful, and by extension may somehow have a way of handling this, but I haven't figured it out...