0

TLDR: What's the most concise way to encode ordered categories to numeric w/ a particular encoding conversion? (i.e. one that preserves the ordered nature of the categories).

["Weak","Normal","Strong"] --> [0,1,2]


Assuming I have an ordered categorical variable like similar to the example from here:
import pandas as pd
raw_data = {'patient': [1, 1, 1, 2, 2], 
        'obs': [1, 2, 3, 1, 2], 
        'treatment': [0, 1, 0, 1, 0],
        'score': ['strong', 'weak', 'normal', 'weak', 'strong']} 
df = pd.DataFrame(raw_data, columns = ['patient', 'obs', 'treatment', 'score'])
df


obs treatment   score
0   1           strong
1   1           weak
2   1           normal
3   2           weak
4   2           strong

I can create a function and apply it across my dataframe to get the desired conversation:

def score_to_numeric(x):
    if x=='strong':
        return 3
    if x=='normal':
        return 2
    if x=='weak':
        return 1

df['score_num'] = df['score'].apply(score_to_numeric)
df

obs treatment   score   score_num
0   1           strong  3
1   1           weak    1
2   1           normal  2
3   2           weak    1
4   2           strong  3

My question: Is there any way I can do this inline? (w/o having to specific a separate "score_to_numeric" function.

Maybe using some kind of lambda or replace functionality? Alternatively, this SO article suggests that Sklearn's LabelEncoder() is pretty powerful, and by extension may somehow have a way of handling this, but I haven't figured it out...

Community
  • 1
  • 1
Afflatus
  • 2,302
  • 5
  • 25
  • 40

1 Answers1

1

you can use map() in conjunction with a dictionary, containing your mapping:

In [5]: d = {'strong':3, 'normal':2, 'weak':1}

In [7]: df['score_num'] = df.score.map(d)

In [8]: df
Out[8]:
   patient  obs  treatment   score  score_num
0        1    1          0  strong          3
1        1    2          1    weak          1
2        1    3          0  normal          2
3        2    1          1    weak          1
4        2    2          0  strong          3
MaxU - stand with Ukraine
  • 205,989
  • 36
  • 386
  • 419
  • Of course! Map! I can combine the two parts to get the one-liner that I was looking for: df['score_num'] = df.score.map({'strong':3, 'normal':2, 'weak':1}) – Afflatus Jun 12 '16 at 20:42
  • @Afflatus, you can do it also this way of course, but i think it's easier (more concise) to keep a dictionary separately, so you can use it multiple times, update it, etc. – MaxU - stand with Ukraine Jun 12 '16 at 20:46
  • True, but in my case I've got to do different encodings for about 10 categorical variables... I don't really want to think about naming all the dictionaries for now... If I need it later, I'll go back and extract out the code. – Afflatus Jun 12 '16 at 21:00