1

My question is related to this question

import pandas as pd
df = pd.DataFrame(
    [['A', 'X', 3], ['A', 'X', 5], ['A', 'Y', 7], ['A', 'Y', 1],
     ['B', 'X', 3], ['B', 'X', 1], ['B', 'X', 3], ['B', 'Y', 1],
     ['C', 'X', 7], ['C', 'Y', 4], ['C', 'Y', 1], ['C', 'Y', 6]],
    columns=['c1', 'c2', 'v1'])
df['CNT'] = df.groupby(['c1', 'c2']).cumcount()+1

I got column 'CNT'. But I'd like to break it apart according to group 'c2' to obtain cumulative count of 'X' and 'Y' respectively.

    c1  c2  v1  CNT Xcnt Ycnt
0   A   X   3   1   1   0
1   A   X   5   2   2   0
2   A   Y   7   1   2   1
3   A   Y   1   2   2   2
4   B   X   3   1   1   0
5   B   X   1   2   2   0
6   B   X   3   3   3   0
7   B   Y   1   1   3   1
8   C   X   7   1   1   0
9   C   Y   4   1   1   1
10  C   Y   1   2   1   2
11  C   Y   6   3   1   3

Any suggestions? I am just starting to explore Pandas and appreciate your help.

Community
  • 1
  • 1
Dong
  • 481
  • 4
  • 15

1 Answers1

1

I don't directly know a way to do this directly, but starting from the calculated CNT column, you can do it as follows:

Make the Xcnt and Ycnt columns:

In [13]: df['Xcnt'] = df['CNT'][df['c2']=='X']

In [14]: df['Ycnt'] = df['CNT'][df['c2']=='Y']

In [15]: df
Out[15]:
   c1 c2  v1  CNT  Xcnt  Ycnt
0   A  X   3    1     1   NaN
1   A  X   5    2     2   NaN
2   A  Y   7    1   NaN     1
3   A  Y   1    2   NaN     2
4   B  X   3    1     1   NaN
5   B  X   1    2     2   NaN
6   B  X   3    3     3   NaN
7   B  Y   1    1   NaN     1
8   C  X   7    1     1   NaN
9   C  Y   4    1   NaN     1
10  C  Y   1    2   NaN     2
11  C  Y   6    3   NaN     3

Next, we want to fill the NaN's per group of c1 by forward filling:

In [23]: df['Xcnt'] = df.groupby('c1')['Xcnt'].fillna(method='ffill')

In [24]: df['Ycnt'] = df.groupby('c1')['Ycnt'].fillna(method='ffill').fillna(0)

In [25]: df
Out[25]:
   c1 c2  v1  CNT  Xcnt  Ycnt
0   A  X   3    1     1     0
1   A  X   5    2     2     0
2   A  Y   7    1     2     1
3   A  Y   1    2     2     2
4   B  X   3    1     1     0
5   B  X   1    2     2     0
6   B  X   3    3     3     0
7   B  Y   1    1     3     1
8   C  X   7    1     1     0
9   C  Y   4    1     1     1
10  C  Y   1    2     1     2
11  C  Y   6    3     1     3

For the Ycnt an extra fillna was needed to fill the convert the NaN's to 0's where the group started with NaNs (couldn't fill forward).

joris
  • 133,120
  • 36
  • 247
  • 202
  • Thanks. Good tips. I am hoping for more general way to do it. In my real data set, c2 would have more than two values (and not known ahead of time). – Dong May 23 '14 at 18:27
  • You could loop over the unique values of column `c2` to perform these operations for each value. That could be quit general and automatic. – joris May 23 '14 at 18:37
  • Got it. Thanks. `for cat in df['c2']: df[cat] = df['CNT'][df['c2']==cat] df[cat] = df.groupby('c1')[cat].fillna(method='ffill').fillna(0)` – Dong May 23 '14 at 22:24
  • Yes, but I think `for cat in df['c2'].unique(): ...` is enough (otherwise you would repeat it multiple times for the same value of `c2`) – joris May 23 '14 at 22:28
  • `df['CNT'][df['c2']=='X']` may be better as `df.loc[df['c2'] == 'X', 'CNT']` – Andy Hayden May 24 '14 at 06:51
  • Yes, indeed, that's the better idiom! (although in this case it doesn't matter I suppose as it is not used for assigned and the values will be copied to the new column in any case?) – joris May 24 '14 at 07:39