How to separate sequential groups in pandas?

Question

The question might be odd, so here's an example.

Suppose my data looks like below:

As you see there are 3 elements a, b, c and we can just groupby & aggregate like:

a  8 
b  4 
c  16

but what I really want to do is, I want to "SEPERATE" "within" groups. So the desired result is:

this means that the first sequential 'a' group and the second 'a' that appears later is treated different. Let me show you simply by a list.

[[a,a,a],[b,b],[a],[ccc]]

The point is we have to consider the sequence. It's not just using pandas cumcount, which only considers the count of appearance. I want to divide each 'sequences', even within same element group.(first aaa and second aaa is different)

I don't think this is that simple in pandas, though I'm avoiding to do this job by just making explicit function-based logic. In short, I want to do this much more simply. Is there any idea?

score 1 · Accepted Answer · answered Jul 27 '17 at 07:13

1

You can use the shift-cumsum pattern.

>>> df.groupby((df['A'] != df['A'].shift()).cumsum(), as_index=False).agg(
        {'A': 'first', 'B': 'sum'})
# Output:
#        A   B
#     0  a   4
#     1  b   4
#     2  a   4
#     3  c  16

answered Jul 27 '17 at 07:13

Alexander

105,104
32
201
196

1

It worked! Thank you so much and I love you. You're the god of pandas. – ritos Jul 27 '17 at 07:22
1

@DSM is one of the gods. I am still learning. https://stackoverflow.com/questions/33130586/python-pandas-creating-a-column-which-keeps-a-running-count-of-consecutive-val – Alexander Jul 27 '17 at 07:26

How to separate sequential groups in pandas?

1 Answers1