-1

I have a dataframe that looks like this:

x | time |  zone
1     10     a
3     11     a
5     12     b
7     13     b
8     14     a
9     18     a
10    20     a
11    22     c
12    24     c 

Imagine that zone is a state that changes over time, I would like to process a certain state individually so I can calculate some metrics at each state.

Basically, I want to divide the data frame into blocks, like this: 1st block:

x | time |  zone
1     10     a
3     11     a

2nd block:

5     12     b
7     13     b

3rd block:

8     14     a
9     18     a
10    20     a

and so on. With this I can calculate metrics like time spent in state, x difference, etc

How can I accomplish this using pandas?

Thanks!

vftw
  • 1,547
  • 3
  • 22
  • 51
  • Why doesn't the 3rd block contain the row with x = 10 and what have you tried? –  Jan 05 '22 at 14:07
  • As an answer below notes, this is so classic that it has duplicate questions but who cares right? You don't attempt or show any research attempt, the answerer is here for reputation and reputation only anyway it seems, so I stop complaining and move on. –  Jan 05 '22 at 14:16
  • @Neither I am not gaining any reputation for this question (already capped), but please provide a duplicate if you have one. I'd be happy to close the question – mozway Jan 05 '22 at 14:19
  • @mozway It may get accepted and you get some points there, right? Anyway, sure, here are *some* duplicates https://stackoverflow.com/questions/40802800/how-to-groupby-consecutive-values-in-pandas-dataframe & https://stackoverflow.com/questions/23691133/split-pandas-dataframe-based-on-groupby. –  Jan 05 '22 at 14:21
  • I didn't expect this. Probably I didn't figure out what I wanted in the best manner and couldn't put it to the best keywords when searching. I will try to be more careful next time. Thank you for your help @mozway – vftw Jan 05 '22 at 14:37

1 Answers1

1

The classical approach is to use this formula for generating groups of consecutive value.

This works by setting a boolean (True) whenever the value changes, and incrementing the count for each change using cumsum.

group = df['zone'].ne(df['zone'].shift()).cumsum()

output:

0    1
1    1
2    2
3    2
4    3
5    3
6    3
7    4
8    4
Name: zone, dtype: int64

Then you can use it to groupby your data (here showing as a dictionary for the example):

dict(list(df.groupby(group)))

output:

{1:    x  time zone
 0  1    10    a
 1  3    11    a,
 2:    x  time zone
 2  5    12    b
 3  7    13    b,
 3:     x  time zone
 4   8    14    a
 5   9    18    a
 6  10    20    a,
 4:     x  time zone
 7  11    22    c
 8  12    24    c}
mozway
  • 194,879
  • 13
  • 39
  • 75