Pandas Dataframe remap categorical column with two values to binary

Question

I have a dataframe coming in and would like to check for strings of 'Male' or 'Female', and if the dataframe contained them it would be replaced with '1' or '0'. At the moment I'm using the code below thanks to @Anand S Kumar's answer.

if dataframe['gender']:
    dataframe['gender'].replace([0,1],['Female','Male'],inplace=True)
if dataframe['sex']:
    dataframe['sex'].replace([0,1],['Female','Male'],inplace=True)

However, I'd like to also cover any other variations like 'male', 'M', and 'm' or 'female', 'F', 'f', and would rather avoid using two more if statements for each variation.

I've tried using a larger list such as...

dataframe['gender'].replace([0,1,0,1,0,1,0,1],['Female','Male','male','female','M','F','m','f'],inplace=True)

A dictionary...

dataframe['gender'].replace({0:'Female',1:'Male', 0:'female',1:'male',0:'F',1:'M',0:'f',1:'m'},inplace=True)

But have gotten the 'The truth value of a Series is ambiguous.' ValueError for both.

Does anyone know a better way, or what I'm doing wrong with my current attempts?

Thanks in advance!

Edit: My ValueError was because of my if statement being vague. I changed it to if 'gender' in dataframe.columns: to fix it. Found the fix here.

score 7 · Accepted Answer · answered Apr 19 '18 at 16:11

7

Going on good faith, assuming your column contains valid data, why not replace based on the first letter of every row?

m = {'m' : 1, 'f' : 0}
df['gender'] = df['gender'].str[0].str.lower().map(m)

Using map, invalid entries are automatically coerced to NaN.

answered Apr 19 '18 at 16:11

cs95

379,657
97
704
746

Worked perfectly, and the invalid default to NaN will help a ton. I need to look into using map() more. Thanks for the answer! – DForsyth Apr 19 '18 at 16:25

score 2 · Answer 2 · answered Apr 19 '18 at 16:14

2

You can use .isin to filter to multiple values:

df[df["Gender"].isin(["MALE", "male", "Male", "m"])] = 1

answered Apr 19 '18 at 16:14

Toby Petty

4,431
1
17
29

Pandas Dataframe remap categorical column with two values to binary

2 Answers2