1

I don't really sure what the proper terms in english for my question.

state | date
=============
'a'   | 2020-01-01
'a'   | 2020-01-02
'b'   | 2020-01-03
'b'   | 2020-01-04
'a'   | 2020-01-05
'a'   | 2020-01-06

The desired output is:

state | min_date    | max_date
================================
'a'   | 2020-01-01  | 2020-01-02
'b'   | 2020-01-03  | 2020-01-04
'a'   | 2020-01-05  | 2020-01-06

I don't really know even how do it (without a loop).

Help please?

Super Mario
  • 923
  • 10
  • 16
  • 3
    You are correct, this is rather hard to describe and find but is generally referred to as "grouping by consecutive values". For this example it's -> `df.groupby(df['state'].ne(df['state'].shift()).cumsum(), as_index=False).agg(state=('state','first'), min_date=('date', 'min'), max_date=('date', 'max'))` . Groups are enumerated by comparing state values to the previous row then using cumsum to enumerate the differences. You can print -> `df['state'].ne(df['state'].shift()).cumsum()` to see how the groups are created. – Henry Ecker Oct 17 '21 at 18:37
  • @HenryEcker Didn't see your comment before posting. Do you want to post your comment as an answer ? – Psidom Oct 17 '21 at 18:41
  • 1
    I actually closed the question with duplicates for creating groups. @Psidom Your answer is great though. I'm okay as it is. – Henry Ecker Oct 17 '21 at 18:42

1 Answers1

2

Try:

df.groupby((df.state != df.state.shift()).cumsum()).agg(
  state = ('state', 'first'), 
  min_date = ('date', 'min'), 
  max_date = ('date', 'max')
).reset_index(drop=True)

  state    min_date    max_date
0   'a'  2020-01-01  2020-01-02
1   'b'  2020-01-03  2020-01-04
2   'a'  2020-01-05  2020-01-06
Psidom
  • 209,562
  • 33
  • 339
  • 356