cumcout groupby --- how to list by groups

Question

My question is related to this question

import pandas as pd
df = pd.DataFrame(
    [['A', 'X', 3], ['A', 'X', 5], ['A', 'Y', 7], ['A', 'Y', 1],
     ['B', 'X', 3], ['B', 'X', 1], ['B', 'X', 3], ['B', 'Y', 1],
     ['C', 'X', 7], ['C', 'Y', 4], ['C', 'Y', 1], ['C', 'Y', 6]],
    columns=['c1', 'c2', 'v1'])
df['CNT'] = df.groupby(['c1', 'c2']).cumcount()+1

I got column 'CNT'. But I'd like to break it apart according to group 'c2' to obtain cumulative count of 'X' and 'Y' respectively.

    c1  c2  v1  CNT Xcnt Ycnt
0   A   X   3   1   1   0
1   A   X   5   2   2   0
2   A   Y   7   1   2   1
3   A   Y   1   2   2   2
4   B   X   3   1   1   0
5   B   X   1   2   2   0
6   B   X   3   3   3   0
7   B   Y   1   1   3   1
8   C   X   7   1   1   0
9   C   Y   4   1   1   1
10  C   Y   1   2   1   2
11  C   Y   6   3   1   3

Any suggestions? I am just starting to explore Pandas and appreciate your help.

score 1 · Accepted Answer · answered May 23 '14 at 18:10

1

I don't directly know a way to do this directly, but starting from the calculated CNT column, you can do it as follows:

Make the Xcnt and Ycnt columns:

In [13]: df['Xcnt'] = df['CNT'][df['c2']=='X']

In [14]: df['Ycnt'] = df['CNT'][df['c2']=='Y']

In [15]: df
Out[15]:
   c1 c2  v1  CNT  Xcnt  Ycnt
0   A  X   3    1     1   NaN
1   A  X   5    2     2   NaN
2   A  Y   7    1   NaN     1
3   A  Y   1    2   NaN     2
4   B  X   3    1     1   NaN
5   B  X   1    2     2   NaN
6   B  X   3    3     3   NaN
7   B  Y   1    1   NaN     1
8   C  X   7    1     1   NaN
9   C  Y   4    1   NaN     1
10  C  Y   1    2   NaN     2
11  C  Y   6    3   NaN     3

Next, we want to fill the NaN's per group of c1 by forward filling:

In [23]: df['Xcnt'] = df.groupby('c1')['Xcnt'].fillna(method='ffill')

In [24]: df['Ycnt'] = df.groupby('c1')['Ycnt'].fillna(method='ffill').fillna(0)

In [25]: df
Out[25]:
   c1 c2  v1  CNT  Xcnt  Ycnt
0   A  X   3    1     1     0
1   A  X   5    2     2     0
2   A  Y   7    1     2     1
3   A  Y   1    2     2     2
4   B  X   3    1     1     0
5   B  X   1    2     2     0
6   B  X   3    3     3     0
7   B  Y   1    1     3     1
8   C  X   7    1     1     0
9   C  Y   4    1     1     1
10  C  Y   1    2     1     2
11  C  Y   6    3     1     3

For the Ycnt an extra fillna was needed to fill the convert the NaN's to 0's where the group started with NaNs (couldn't fill forward).

answered May 23 '14 at 18:10

joris

133,120
36
247
202

Thanks. Good tips. I am hoping for more general way to do it. In my real data set, c2 would have more than two values (and not known ahead of time). – Dong May 23 '14 at 18:27
You could loop over the unique values of column `c2` to perform these operations for each value. That could be quit general and automatic. – joris May 23 '14 at 18:37
Got it. Thanks. `for cat in df['c2']: df[cat] = df['CNT'][df['c2']==cat] df[cat] = df.groupby('c1')[cat].fillna(method='ffill').fillna(0)` – Dong May 23 '14 at 22:24
Yes, but I think `for cat in df['c2'].unique(): ...` is enough (otherwise you would repeat it multiple times for the same value of `c2`) – joris May 23 '14 at 22:28
`df['CNT'][df['c2']=='X']` may be better as `df.loc[df['c2'] == 'X', 'CNT']` – Andy Hayden May 24 '14 at 06:51
Yes, indeed, that's the better idiom! (although in this case it doesn't matter I suppose as it is not used for assigned and the values will be copied to the new column in any case?) – joris May 24 '14 at 07:39

cumcout groupby --- how to list by groups

1 Answers1