2

The question might be odd, so here's an example.

Suppose my data looks like below:

A  B  
a  2
a  1 
a  1 
b  3 
b  1 
a  4
c  3
c  6
c  7

As you see there are 3 elements a, b, c and we can just groupby & aggregate like:

a  8 
b  4 
c  16

but what I really want to do is, I want to "SEPERATE" "within" groups. So the desired result is:

a  4 
b  4 
a  4
c  16

this means that the first sequential 'a' group and the second 'a' that appears later is treated different. Let me show you simply by a list.

[[a,a,a],[b,b],[a],[ccc]] 

The point is we have to consider the sequence. It's not just using pandas cumcount, which only considers the count of appearance. I want to divide each 'sequences', even within same element group.(first aaa and second aaa is different)

I don't think this is that simple in pandas, though I'm avoiding to do this job by just making explicit function-based logic. In short, I want to do this much more simply. Is there any idea?

Cœur
  • 37,241
  • 25
  • 195
  • 267
ritos
  • 641
  • 1
  • 5
  • 10

1 Answers1

1

You can use the shift-cumsum pattern.

>>> df.groupby((df['A'] != df['A'].shift()).cumsum(), as_index=False).agg(
        {'A': 'first', 'B': 'sum'})
# Output:
#        A   B
#     0  a   4
#     1  b   4
#     2  a   4
#     3  c  16
Alexander
  • 105,104
  • 32
  • 201
  • 196
  • 1
    It worked! Thank you so much and I love you. You're the god of pandas. – ritos Jul 27 '17 at 07:22
  • 1
    @DSM is one of the gods. I am still learning. https://stackoverflow.com/questions/33130586/python-pandas-creating-a-column-which-keeps-a-running-count-of-consecutive-val – Alexander Jul 27 '17 at 07:26