2

Suppose I have a dataframe that has date and ID column. This is a time series data set. So i need to generate a time series identifier for this dataframe. That is, I need to add a value corresponding to each unique set. Is there a way to do this ?

df = pd.DataFrame({'Date':[2012-01-01, 2012-01-01, 2012-01-01, 2012-01-02, 2012-01-02, 2012-01-03, 2012-01-03, 2012-01-03, 2012-01-04, 2012-01-01, 2012-01-04],
                      'Id':[1,2,3,4,5,6,7,8,9,10,11]})
print(df)

Output:

   Date       Id
2012-01-01     1
2012-01-01     2
2012-01-01     3
2012-01-02     4
2012-01-02     5
2012-01-03     6
2012-01-03     7
2012-01-03     8
2012-01-04     9
2012-01-01     10
2012-01-04     11

I need to order the dates according to its uniqueness like

   Date       Id      TimeID
2012-01-01     1         0
2012-01-02     4         0
2012-01-03     6         0
2012-01-04     9         0
2012-01-01     2         1
2012-01-02     5         1
2012-01-03     7         1
2012-01-04     11        1
2012-01-01     3         2
2012-01-03     8         2
2012-01-01     10        3
smci
  • 32,567
  • 20
  • 113
  • 146
Firenze
  • 365
  • 2
  • 11
  • Does this answer your question? [Pandas number rows within group in increasing order](https://stackoverflow.com/questions/37997668/pandas-number-rows-within-group-in-increasing-order) – smci Jul 01 '20 at 07:51
  • By the way, you're ordering the rows, not the column. (Ok so those are rows within the 'Date' column. But it counts as ordering rows) – smci Jul 01 '20 at 07:54

2 Answers2

2

Use GroupBy.cumcount with DataFrame.sort_values:

df['TimeID'] = df.groupby('Date').cumcount()
df = df.sort_values('TimeID')
print (df)
          Date  Id  TimeID
0   2012-01-01   1       0
3   2012-01-02   4       0
5   2012-01-03   6       0
8   2012-01-04   9       0
1   2012-01-01   2       1
4   2012-01-02   5       1
6   2012-01-03   7       1
10  2012-01-04  11       1
2   2012-01-01   3       2
7   2012-01-03   8       2
9   2012-01-01  10       3
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
0

First, convert the string dates to datetimes with pd.to_datetime(). Then, use groupby() and .cumcount() as per this solution:

import pandas as pd
  
df = pd.DataFrame({'Date': ['2012-01-01','2012-01-01','2012-01-01','2012-01-02',
        '2012-01-02','2012-01-03','2012-01-03','2012-01-03','2012-01-04','2012-01-01','2012-01-04'],
        'Id': [1,2,3,4,5,6,7,8,9,10,11]})

# strictly, you can read in a datetime as a datetime at pd.read_csv() time
df['Date'] = pd.to_datetime(df['Date'])
smci
  • 32,567
  • 20
  • 113
  • 146