0

I have a time series data which I want to find the sequence for a certain event based on the time and another category value.

Example:
Time    Event    Category    Seq
1         A         1         1 
2         A         1         2  
3         B         1         1
4         B         2         1
5         C         1         1
6         A         1         1

I searched and found that I can use groupby and cumcount() to do this but I'm not able to achieve the sequences at time 1 and time 6 in the above example (Event A of same category and different time should have a new sequence number, my code continues the sequence and outputs as 3)

df['sequence']=df.groupby('Event').cumcount() + 1

Related Question:

Pandas: conditional rolling count (considers the one-columnar case)

Azat Ibrakov
  • 9,998
  • 9
  • 38
  • 50
  • Grouping by `'Event'` and cumulative counting should give you `[1, 2, 1, 2, 1, 3]` as your new `sequence` column. If that is not what you expect, what do you expect instead? – Kay Wittig Jul 17 '18 at 07:01
  • 1
    I need the sequence to be grouped by the Event, Category and time. For example - For event A, category 1 the sequence for time 1 and 2 is 1 & 2 respectively. And the next occurrence of A1 at time 6 should be 1 again. Also for event B, for category 1 and 2 at times 3 and 4 should both be 1. I hope my explanation is clear now. Thanks in advance – Deepthi Suresh Jul 23 '18 at 01:00
  • Thanks for pointing that out. Your question is related to this one: https://stackoverflow.com/questions/25119524/pandas-conditional-rolling-count but since you ask for a multi-columnar case I will add answer for that. – Kay Wittig Jul 23 '18 at 07:01

1 Answers1

0

You can go like this. Given this data

df = pd.DataFrame({'Time': [1, 2, 3, 4, 5, 6],
                   'Event': list('AABBCA'),
                   'Category': [1, 1, 1, 2, 1, 1]})

you want to groupby 'Event' and 'Category' and then cumcount. In order to restart your count you can compare the consecutive rows by using shift(1) and persist the information using any(axis=1) and cumsum(). The total command then reads

>>> df['Seq'] = df.groupby((df[['Event', 'Category']] != df[['Event', 'Category']].shift(1)).any(axis=1).cumsum()).cumcount() + 1
>>> df
   Time Event  Category  Seq
0     1     A         1    1
1     2     A         1    2
2     3     B         1    1
3     4     B         2    1
4     5     C         1    1
5     6     A         1    1

Note: The 'Time' column seems to be only your index and might be skipped. Though I included it.

Kay Wittig
  • 563
  • 3
  • 18