-3

Say I have x=["apple","orange","orange","apple","pear"] I would like to have a categorical representation with integers e.g. y=[1,2,2,1,3]. What would be the best way to do so?

Hanan Shteingart
  • 8,480
  • 10
  • 53
  • 66

3 Answers3

1

You could use pd.factorize and use field 0 for that:

In [465]: pd.factorize(x)
Out[465]: (array([0, 1, 1, 0, 2]), array(['apple', 'orange', 'pear'], dtype=object))

In [466]: pd.factorize(x)[0] + 1
Out[466]: array([1, 2, 2, 1, 3])
Anton Protopopov
  • 30,354
  • 12
  • 88
  • 93
0

You can use:

import pandas as pd

x=["apple","orange","orange","apple","pear"]
s = pd.Series(x)

print s

0     apple
1    orange
2    orange
3     apple
4      pear

print pd.Categorical(s).codes

[0 1 1 0 2]

Or:

import pandas as pd

x=["apple","orange","orange","apple","pear"]

print pd.Categorical(x).codes

#[0 1 1 0 2]
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
-1

With Pandas: x.astype('category').cat.codes

Hanan Shteingart
  • 8,480
  • 10
  • 53
  • 66