Group by with sum conditions

Question

I have the following df and I'd like to group it by Date & Ref but with sum conditions.

In this respect I'd need to group by Date & Ref and sum 'Q' column only if P is >= than PP.

df = DataFrame({'Date' : ['1', '1', '1', '1'],
                'Ref' : ['one', 'one', 'two', 'two'],
                'P' : ['50', '65', '30', '38'],
                'PP' : ['63', '63', '32', '32'],
                'Q' : ['10', '15', '20', '10']})

df.groupby(['Date','Ref'])['Q'].sum() #This does the right grouping byt summing the whole column
df.loc[df['P'] >= df['PP'], ('Q')].sum() #this has the right sum condition, but does not divide between Date & Ref

Is there a way to do that? Many thanks in advance

score 5 · Answer 1 · answered Jan 24 '19 at 11:17

5

Just filter prior to grouping:

In[15]:
df[df['P'] >= df['PP']].groupby(['Date','Ref'])['Q'].sum()

Out[15]: 
Date  Ref
1     one    15
      two    10
Name: Q, dtype: object

This reduces the size of the df in the first place so will speed up the groupby operation

answered Jan 24 '19 at 11:17

EdChum

376,765
198
813
562

Many thanks Ed. And for dividing this grouping by the total Q (no conditional)? Is there an easy way of doing it? – Rose Jan 24 '19 at 16:08
If you have a new question then you should post a new question, answering new questions in comments is counter-productive. Besides are you wanting to applying multiple functions to a `groupby`? You may want to read the [docs](http://pandas.pydata.org/pandas-docs/stable/groupby.html#applying-multiple-functions-at-once) – EdChum Jan 24 '19 at 16:15

Dani Mesejo · Answer 2 · 2019-01-24T11:35:44.253

You could do:

import pandas as pd

df = pd.DataFrame({'Date' : ['1', '1', '1', '1'],
                'Ref' : ['one', 'one', 'two', 'two'],
                'P' : ['50', '65', '30', '38'],
                'PP' : ['63', '63', '32', '32'],
                'Q' : ['10', '15', '20', '10']})

def conditional_sum(x):
    return x[x['P'] >= x['PP']].Q.sum()

result = df.groupby(['Date','Ref']).apply(conditional_sum)

print(result)

Output

Date  Ref
1     one    15
      two    10
dtype: object

UPDATE

If you want to sum multiple columns in the output, you could use loc:

def conditional_sum(x):
    return x.loc[x['P'] >= x['PP'], ['Q', 'P']].sum()


result = df.groupby(['Date', 'Ref']).apply(conditional_sum)

print(result)

Output

             Q     P
Date Ref            
1    one  15.0  65.0
     two  10.0  38.0

Note that in the example above I used column P for the sake of showing how to do it with multiple columns.

Many thanks Daniel! And if a want to sum a list of columns (Q, R, S) for instance? How do I include it in return x[x['P'] >= x['PP']].Q.sum() ? — Rose, Jan 24 '19 at 11:32

Group by with sum conditions

2 Answers2