python pandas groupby() result

Question

I have the following python pandas data frame:

df = pd.DataFrame( {
   'A': [1,1,1,1,2,2,2,3,3,4,4,4],
   'B': [5,5,6,7,5,6,6,7,7,6,7,7],
   'C': [1,1,1,1,1,1,1,1,1,1,1,1]
    } );

df
    A  B  C
0   1  5  1
1   1  5  1
2   1  6  1
3   1  7  1
4   2  5  1
5   2  6  1
6   2  6  1
7   3  7  1
8   3  7  1
9   4  6  1
10  4  7  1
11  4  7  1

I would like to have another column storing a value of a sum over C values for fixed (both) A and B. That is, something like:

    A  B  C  D
0   1  5  1  2
1   1  5  1  2
2   1  6  1  1
3   1  7  1  1
4   2  5  1  1
5   2  6  1  2
6   2  6  1  2
7   3  7  1  2
8   3  7  1  2
9   4  6  1  1
10  4  7  1  2
11  4  7  1  2

I have tried with pandas groupby and it kind of works:

res = {}
for a, group_by_A in df.groupby('A'):
    group_by_B = group_by_A.groupby('B', as_index = False)
    res[a] = group_by_B['C'].sum()

but I don't know how to 'get' the results from res into df in the orderly fashion. Would be very happy with any advice on this. Thank you.

Andy Hayden · Accepted Answer · 2013-07-16T00:53:53.820

26

Here's one way (though it feels this should work in one go with an apply, I can't get it).

In [11]: g = df.groupby(['A', 'B'])

In [12]: df1 = df.set_index(['A', 'B'])

The size groupby function is the one you want, we have to match it to the 'A' and 'B' as the index:

In [13]: df1['D'] = g.size()  # unfortunately this doesn't play nice with as_index=False
# Same would work with g['C'].sum()

In [14]: df1.reset_index()
Out[14]:
    A  B  C  D
0   1  5  1  2
1   1  5  1  2
2   1  6  1  1
3   1  7  1  1
4   2  5  1  1
5   2  6  1  2
6   2  6  1  2
7   3  7  1  2
8   3  7  1  2
9   4  6  1  1
10  4  7  1  2
11  4  7  1  2

edited Jul 16 '13 at 00:53

answered Jul 16 '13 at 00:48

Andy Hayden

359,921
101
625
535

Thank you @Andy Hayden! The solution with `sum` is more generic I think. In fact, I don't have `1`'s in `C` (when `size` works perfectly, as you pointed out in your solution) but rather some floats, so to make that work properly I need to go with `sum`. But anyway, brilliant, thank you again. – Simon Righley Jul 16 '13 at 01:05
6

I think the one-liner you were dreaming of is ``df['D'] = df.groupby(['A', 'B']).transform(np.size)``. In good times and bad, ``transform`` is there. :-D – Dan Allan Jul 16 '13 at 02:31
@DanAllan, how is column D being calculated? – Mazz Dec 10 '19 at 10:09

score 15 · Answer 2 · answered Dec 30 '15 at 14:54

15

You could also do a one liner using transform applied to the groupby:

df['D'] = df.groupby(['A','B'])['C'].transform('sum')

answered Dec 30 '15 at 14:54

DrTRD

1,641
1
13
18

score 8 · Answer 3 · answered Aug 20 '13 at 17:55

8

You could also do a one liner using merge as follows:

df = df.merge(pd.DataFrame({'D':df.groupby(['A', 'B'])['C'].size()}), left_on=['A', 'B'], right_index=True)

answered Aug 20 '13 at 17:55

andrew

1,843
20
19

score 2 · Answer 4 · edited Sep 21 '21 at 19:41

2

you can use this method :

columns = ['col1','col2',...]
df.groupby('col')[columns].sum()

if you want you can also use .sort_values(by = 'colx', ascending = True/False) after .sum() to sort the final output by a specific column (colx) and in an ascending or descending order.

edited Sep 21 '21 at 19:41

NuLo

1,298
1
11
16

answered Sep 21 '21 at 09:38

Mohsen

31
2

python pandas groupby() result

4 Answers4

Linked

Related