How can I replace int values with string values in a dataframe

Question

I currently have a csv file with a lot of categorical variables. The data originally is derived from spss and doing a further cluster analysis on the data, I need instead of numbers the names of the variables. So I will replace the int values by the string such as in the following example 1 does stands for male, while 2 stands for female for example

df[(df['gender']==1)]['gender'] = 'male'

However I know it can't work, since the column contains originally int values, so replacing by string value is not possible, so first I tried to convert the column to string such as with the following code before replacing the 1 by male

df['gender'] = df['gender'].astype(str)

or

df['gender'].apply(str)

However when I run the following code afterwards

df[(df['gender']=='1')]['gender'] = 'male'

I get the following error

TypeError: invalid type comparison

So I have no clue how to handle this problem :(

@jezrael Most are int64, while some are float64 – Joy Jul 19 '18 at 12:00 — Joy, Jul 19 '18 at 12:00

jezrael · Accepted Answer · 2018-07-19T12:04:54.120

7

I think best here is map by dictionary by all possible values in gender, else get NaNs for not matched values:

df['gender'] = df['gender'].map({1:'male', 2:'female'})

Problem should be mixed types in column after replacing - replaced 1 to strings male and original numeric 2.

edited Jul 19 '18 at 12:04

answered Jul 19 '18 at 11:57

jezrael

822,522
95
1,334
1,252

thank you very much! It did work! I have looked a while for this answer and finally it works. – Joy Jul 19 '18 at 12:25
@Joy - You are welcome! – jezrael Jul 19 '18 at 12:26

How can I replace int values with string values in a dataframe

1 Answers1