0

If I am having a variable with two values (for example Sex can take male or female), I use code like,

train_df["Sex"] = train_df["Sex"].apply(lambda sex: 0 if sex == 'male' else 1)

to convert string to integer. What is the way to do it if the variable takes more than 2 values, like Salary categorised as low/medium/high? How to assign value similarly as above?

Brown Bear
  • 19,655
  • 10
  • 58
  • 76

2 Answers2

5

Use map by dictionary:

d = {
    'male': 0,
    'female': 1,
    'other': 2
}

train_df["Sex"] = train_df["Sex"].map(d)

But for Salary is better cut if need new values by ranges:

train_df = pd.DataFrame({'Salary': [100,200,300,500]})


bins = [0, 200, 400, np.inf]
labels=['low','medium','high']
train_df['label'] = pd.cut(train_df['Salary'], bins=bins, labels=labels)
print (train_df)
   Salary   label
0     100     low
1     200     low
2     300  medium
3     500    high
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
2

You can make a transformation dict for example:

values = {
    "low" : 0,
    "med" : 1,
    "high": 2
}
train_df["Sex"] = train_df["Sex"].apply(lambda level: values.get(level, 0))
Netwave
  • 40,134
  • 6
  • 50
  • 93