5

I want to do what they've done in the answer here: Calculating the number of specific consecutive equal values in a vectorized way in pandas , but using a grouped dataframe instead of a series.

So given a dataframe with several columns

A    B    C   
------------ 
x    x    0
x    x    5
x    x    2
x    x    0
x    x    0
x    x    3
x    x    0
y    x    1
y    x    10
y    x    0
y    x    5
y    x    0
y    x    0

I want to groupby columns A and B, then count the number of consecutive zeros in C. After that I'd like to return counts of the number of times each length of zeros occurred. So I want output like this:

A    B    num_consecutive_zeros  count
---------------------------------------
x    x            1                2
x    x            2                1
y    x            1                1
y    x            2                1

I don't know how to adapt the answer from the linked question to deal with grouped dataframes.

Community
  • 1
  • 1
user3591836
  • 953
  • 2
  • 16
  • 29
  • What the difference between the first two rows and the last two in your target output? Why aren't those grouped together. I'm not sure you've clearly explained the logic of what you are trying to achieve. – Alexander Apr 15 '15 at 03:09
  • @Alexander Take the first two rows (x x), the single zero pattern occurs 2 times in that grouping. the two consecutive zero pattern occurs one time in that grouping. The last two rows are for the (y x) grouping with the same logic. Make sense? – Bob Haffner Apr 15 '15 at 03:13
  • yes, exactly how @BobHaffner explained it. I should have chosen better values for the A & B columns for the toy example. – user3591836 Apr 15 '15 at 04:06
  • 1
    So if the values of the first seven rows (all A='x') were to appear consecutively one group after the other, then the desired output for num_consecutive_zeros would be 1, 2, 1, 2 with count=2,1,2,1? Or would it be 2, 4 and 4,2, respectively? – Alexander Apr 15 '15 at 05:09

1 Answers1

1

Here is the code, count_consecutive_zeros() use numpy functions and pandas.value_counts() to get the results, and use groupby().apply(count_consecutive_zeros) to call count_consecutive_zeros() for every group. call reset_index() to change MultiIndex to columns:

import pandas as pd
import numpy as np
from io import BytesIO
text = """A    B    C   
x    x    0
x    x    5
x    x    2
x    x    0
x    x    0
x    x    3
x    x    0
y    x    1
y    x    10
y    x    0
y    x    5
y    x    0
y    x    0"""

df = pd.read_csv(BytesIO(text.encode()), delim_whitespace=True)

def count_consecutive_zeros(s):
    v = np.diff(np.r_[0, s.values==0, 0])
    s = pd.value_counts(np.where(v == -1)[0] - np.where(v == 1)[0])
    s.index.name = "num_consecutive_zeros"
    s.name = "count"
    return s

df.groupby(["A", "B"]).C.apply(count_consecutive_zeros).reset_index()
kiril
  • 4,914
  • 1
  • 30
  • 40
HYRY
  • 94,853
  • 25
  • 187
  • 187