Pandas - change string to a number

Question

I have a dataset with a lot of email and I want change this:

df = pd.DataFrame( [('aatest@gmail.com', 0, 3.0), ('aatest@gmail.com', 1, 2.0), 
                    ('aatest@gmail.com', 1 ,3.0), ('bbtest@gmail.com', 1, 1.0), 
                    ('cctest@gmail.com', 2, 5.0)]) 

df
0  aatest@gmail.com  0  3
1  aatest@gmail.com  1  2
2  aatest@gmail.com  1  3
3  bbtest@gmail.com  1  1
4  cctest@gmail.com  2  5

to this:

df2 = pd.DataFrame(
[(0, 0, 3.0), (0, 1, 2.0), (0,1 ,3.0), (1, 1, 1.0), (2, 2, 5.0)])

df2
   0  1  2
0  0  0  3
1  0  1  2
2  0  1  3
3  1  1  1
4  2  2  5

i.e, change the email to a number, but the same email stay with the same number

How can I do this?

jezrael · Accepted Answer · 2016-01-15T16:16:41.730

1

Use factorize:

df[0] = pd.factorize(df[0])[0]

print df

   0  1  2
0  0  0  3
1  0  1  2
2  0  1  3
3  1  1  1
4  2  2  5

Or rank:

df[0] = df[0].rank(method='dense') - 1
print df

   0  1  2
0  0  0  3
1  0  1  2
2  0  1  3
3  1  1  1
4  2  2  5

edited Jan 15 '16 at 16:16

answered Jan 15 '16 at 16:05

jezrael

822,522
95
1,334
1,252

Pandas - change string to a number

1 Answers1