-1

I'm trying to convert a string column in a dataframe to int. The strings should be replaced with an integer as a key value.

Data:

user_id site_id 
100     url1.com 
100     url2.com 
100     url1.com 
101     url2.com 
101     url2.com 
101     url2.com

Wanted output:

user_id site_id 
100     1 
100     2 
100     1 
101     2 
101     2 
101     2

I tried to get all unique urls with:

names = pd.unique(df.site_id.ravel()) 
urls = pd.Series(np.arange(len(names)), names) 

and then

df["site_id"] = df.applymapp(urls.get)
Julien Marrec
  • 11,605
  • 4
  • 46
  • 63
Duesentrieb
  • 492
  • 2
  • 7
  • 18

1 Answers1

1

You want factorize to encode the values to ints:

In [52]:
df['site_id'] = pd.factorize(df['site_id'])[0] + 1
df

Out[52]:
   user_id  site_id
0      100        1
1      100        2
2      100        1
3      101        2
4      101        2
5      101        2

here factorize returns an array:

In [53]:
pd.factorize(df['site_id'])

Out[53]:
(array([0, 1, 0, 1, 1, 1], dtype=int64), Int64Index([1, 2], dtype='int64'))

we want the encoded values in the tuple and add 1 to each:

pd.factorize(df['site_id'])[0] + 1
EdChum
  • 376,765
  • 198
  • 813
  • 562