1

I currently have a csv file with a lot of categorical variables. The data originally is derived from spss and doing a further cluster analysis on the data, I need instead of numbers the names of the variables. So I will replace the int values by the string such as in the following example 1 does stands for male, while 2 stands for female for example

df[(df['gender']==1)]['gender'] = 'male'

However I know it can't work, since the column contains originally int values, so replacing by string value is not possible, so first I tried to convert the column to string such as with the following code before replacing the 1 by male

df['gender'] = df['gender'].astype(str) 

or

df['gender'].apply(str)

However when I run the following code afterwards

df[(df['gender']=='1')]['gender'] = 'male'

I get the following error

TypeError: invalid type comparison

So I have no clue how to handle this problem :(

Joy
  • 93
  • 1
  • 12

1 Answers1

7

I think best here is map by dictionary by all possible values in gender, else get NaNs for not matched values:

df['gender'] = df['gender'].map({1:'male', 2:'female'}) 

Problem should be mixed types in column after replacing - replaced 1 to strings male and original numeric 2.

jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252