Convert pandas series from string to unique int ids

Question

I have a categorical variable in a series. I want to assign integer ids to each unique value and create a new series with the ids, effectively turning a string variable into an integer variable. What is the most compact/efficient way to do this?

score 40 · Accepted Answer · answered Sep 21 '14 at 20:21

40

You could use pandas.factorize:

In [32]: s = pd.Series(['a','b','c'])

In [33]: labels, levels = pd.factorize(s)

In [35]: labels
Out[35]: array([0, 1, 2])

answered Sep 21 '14 at 20:21

unutbu

842,883
184
1,785
1,677

2

Note that from 0.15 (to be released in the coming weeks), there will be more integrated categorical support, see http://pandas-docs.github.io/pandas-docs-travis/whatsnew.html#whatsnew-0150-cat – joris Sep 21 '14 at 20:22

score 19 · Answer 2 · answered Jul 16 '15 at 21:28

Example using the new pandas categorical type in pandas 0.15+

http://pandas.pydata.org/pandas-docs/version/0.16.2/categorical.html

In [553]: x = pd.Series(['a', 'a', 'a', 'b', 'b', 'c']).astype('category')

In [554]: x
Out[554]: 
0    a
1    a
2    a
3    b
4    b
5    c
dtype: category
Categories (3, object): [
                        a
                        , b
                        , c]

In [555]: x.cat.codes
Out[555]: 
0    0
1    0
2    0
3    1
4    1
5    2
dtype: int8

Convert pandas series from string to unique int ids

2 Answers2

Linked

Related