2

I have a dataframe coming in and would like to check for strings of 'Male' or 'Female', and if the dataframe contained them it would be replaced with '1' or '0'. At the moment I'm using the code below thanks to @Anand S Kumar's answer.

if dataframe['gender']:
    dataframe['gender'].replace([0,1],['Female','Male'],inplace=True)
if dataframe['sex']:
    dataframe['sex'].replace([0,1],['Female','Male'],inplace=True)

However, I'd like to also cover any other variations like 'male', 'M', and 'm' or 'female', 'F', 'f', and would rather avoid using two more if statements for each variation.

I've tried using a larger list such as...

dataframe['gender'].replace([0,1,0,1,0,1,0,1],['Female','Male','male','female','M','F','m','f'],inplace=True)

A dictionary...

dataframe['gender'].replace({0:'Female',1:'Male', 0:'female',1:'male',0:'F',1:'M',0:'f',1:'m'},inplace=True)

But have gotten the 'The truth value of a Series is ambiguous.' ValueError for both.

Does anyone know a better way, or what I'm doing wrong with my current attempts?

Thanks in advance!

Edit: My ValueError was because of my if statement being vague. I changed it to if 'gender' in dataframe.columns: to fix it. Found the fix here.

cs95
  • 379,657
  • 97
  • 704
  • 746
DForsyth
  • 498
  • 8
  • 19

2 Answers2

7

Going on good faith, assuming your column contains valid data, why not replace based on the first letter of every row?

m = {'m' : 1, 'f' : 0}
df['gender'] = df['gender'].str[0].str.lower().map(m)

Using map, invalid entries are automatically coerced to NaN.

cs95
  • 379,657
  • 97
  • 704
  • 746
  • Worked perfectly, and the invalid default to NaN will help a ton. I need to look into using map() more. Thanks for the answer! – DForsyth Apr 19 '18 at 16:25
2

You can use .isin to filter to multiple values:

df[df["Gender"].isin(["MALE", "male", "Male", "m"])] = 1
Toby Petty
  • 4,431
  • 1
  • 17
  • 29