Say I have x=["apple","orange","orange","apple","pear"]
I would like to have a categorical representation with integers e.g. y=[1,2,2,1,3]
. What would be the best way to do so?
Asked
Active
Viewed 218 times
-3

Hanan Shteingart
- 8,480
- 10
- 53
- 66
-
What do the integers represent? – gtlambert Jan 14 '16 at 12:15
-
1Presumably you checked the [docs](http://pandas.pydata.org/pandas-docs/stable/categorical.html)? – EdChum Jan 14 '16 at 12:18
-
1If you're working with `numpy` you can simply `np.unique(["apple","orange","orange","apple","pear"], return_inverse=True)[1]`, without turning to pandas – Sergey Bushmanov Jan 14 '16 at 12:30
3 Answers
1
You could use pd.factorize
and use field 0 for that:
In [465]: pd.factorize(x)
Out[465]: (array([0, 1, 1, 0, 2]), array(['apple', 'orange', 'pear'], dtype=object))
In [466]: pd.factorize(x)[0] + 1
Out[466]: array([1, 2, 2, 1, 3])

Anton Protopopov
- 30,354
- 12
- 88
- 93
0
You can use:
import pandas as pd
x=["apple","orange","orange","apple","pear"]
s = pd.Series(x)
print s
0 apple
1 orange
2 orange
3 apple
4 pear
print pd.Categorical(s).codes
[0 1 1 0 2]
Or:
import pandas as pd
x=["apple","orange","orange","apple","pear"]
print pd.Categorical(x).codes
#[0 1 1 0 2]

jezrael
- 822,522
- 95
- 1,334
- 1,252