Conditional counts in pandas group by

Question

I have the following dataframe

I want to reformat it in the following way:

Group by name/account/monthly periods
Average (mean) balance to two decimal places
Average (mean) for transactions to no decimal places
Count of days where balance < 0
Count of days where Balance > Max credit

So the I apply the following function to make a Series of all the aggregations, and use the Series index as labels for the new columns:

 def f(x):
    d = {}
    d['Avg_Transactions'] = x['Transaction'].mean().round()
    d['Avg_Balance'] = x['Balance'].mean().round(2)
    d['Zero_Balance_days'] = (x['Balance'] < 0).count() 
    d['Over_Credit_days'] = (x['Balance'] > x['Max Credit']).count()
    return pd.Series(d, index=['Avg_Transactions', 'Avg_Balance', 'Zero_Balance_days', 'Over_Credit_days'])

month = df.groupby(['Account','Name', 'Month']).apply(f)

Which gives the following output:

But I am getting the conditional counts wrong and instead counting all days. What would be the proper way to write these?

Please do not include images of your dataframes which would require someone to manually transcribe the dataframes from the images, but rather include a _small_ subset of your data as a __copyable__ piece of code that can be used for testing as well as your expected output for the __provided__ data. See [MRE - Minimal, Reproducible, Example](https://stackoverflow.com/help/minimal-reproducible-example), and [How to make good reproducible pandas examples](https://stackoverflow.com/q/20109391/15497888) for more information. — Henry Ecker, Jun 12 '21 at 17:16
Use `d['Zero_Balance_days'] = (x['Balance'] < 0).sum() ` instead ? That is, `sum()` instead of `count()` Similarly for the other field — SeaBean, Jun 12 '21 at 17:19
Hi @bran, and welcome to SO. Please [convert your images](https://meta.stackoverflow.com/questions/285551/why-not-upload-images-of-code-errors-when-asking-a-question) to something that’s usable − so that other users can help you ! — Cimbali, Jun 12 '21 at 17:22

score 1 · Accepted Answer · answered Jun 12 '21 at 17:25

1

You can try replace your 2 lines with .count() to .sum(), as follows:

d['Zero_Balance_days'] = (x['Balance'] < 0).sum() 
d['Over_Credit_days'] = (x['Balance'] > x['Max Credit']).sum()

.count() returns number of non-NA/null observations in the Series of boolean index while both True/False are not NA/null and will be counted as well.

.sum() returns the sum of entries of True since True is interpreted as 1 while False is interpreted as 0 in the summation.

answered Jun 12 '21 at 17:25

SeaBean

22,547
3
13
25

@bran Please consider upvoting my answer if it proved helpful to you (see [How to upvote on Stack Overflow?](https://meta.stackexchange.com/questions/173399/how-can-i-upvote-answers-and-comments)). – SeaBean Jun 12 '21 at 17:35

Conditional counts in pandas group by

1 Answers1