2

I have a pandas Series in which the data is datetime type. I would like to convert it into a unique integer index. I am looking for a direct, fast command, as the data is big.

Example:

           0
    0  2015-07-05
    1  2015-07-12
    3  2015-07-19
    4  2015-07-12

Should be converted to:

       0
    0  1
    1  2
    3  3
    4  2

In fact, I am also wondering whether there is a general purpose command, that converts a series of any data type into a series of unique integers in this way.

Shivkumar kondi
  • 6,458
  • 9
  • 31
  • 58
splinter
  • 3,727
  • 8
  • 37
  • 82

1 Answers1

3

Use factorize:

s = pd.Series(['2015-07-05', '2015-07-12', '2015-07-19', '2015-07-12'], name=0)
print (s)
0    2015-07-05
1    2015-07-12
2    2015-07-19
3    2015-07-12
Name: 0, dtype: object

s1 = pd.Series(pd.factorize(s)[0] + 1, s.index)
print (s1)
0    1
1    2
3    3
4    2
dtype: int64

Another possible solution is rank:

s1 = s.rank(method='dense').astype(int)
print (s1)
0    1
1    2
2    3
3    2
Name: 0, dtype: int32

Timings are different:

s = pd.concat([s]*100000).reset_index(drop=True)

In [78]: %timeit (pd.Series(pd.factorize(s)[0] + 1, s.index))
100 loops, best of 3: 13.9 ms per loop

In [79]: %timeit (s.rank(method='dense').astype(int))
1 loop, best of 3: 536 ms per loop
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • ***I like it*** – piRSquared Feb 01 '17 at 08:14
  • Thank you, Can I think of this as a general purpose approach? That is, also for all other data types, not only datetime – splinter Feb 01 '17 at 08:16
  • Yes, it is general approach, see [docs](http://pandas.pydata.org/pandas-docs/stable/reshaping.html#factorizing-values) – jezrael Feb 01 '17 at 08:17
  • Thank you! one last question: what about if the unique indexing is for two columns? That is, instead of a Series, we have two columns and we want to have a unique index for each pair. – splinter Feb 02 '17 at 11:25
  • Then is necessary create new series - there are 2 possible solutions - [check here](http://stackoverflow.com/q/41974374/2901002) – jezrael Feb 02 '17 at 11:28