0

I have the following dataframe enter image description here

I want to reformat it in the following way:

  1. Group by name/account/monthly periods
  2. Average (mean) balance to two decimal places
  3. Average (mean) for transactions to no decimal places
  4. Count of days where balance < 0
  5. Count of days where Balance > Max credit

So the I apply the following function to make a Series of all the aggregations, and use the Series index as labels for the new columns:

 def f(x):
    d = {}
    d['Avg_Transactions'] = x['Transaction'].mean().round()
    d['Avg_Balance'] = x['Balance'].mean().round(2)
    d['Zero_Balance_days'] = (x['Balance'] < 0).count() 
    d['Over_Credit_days'] = (x['Balance'] > x['Max Credit']).count()
    return pd.Series(d, index=['Avg_Transactions', 'Avg_Balance', 'Zero_Balance_days', 'Over_Credit_days'])

month = df.groupby(['Account','Name', 'Month']).apply(f)

Which gives the following output: enter image description here

But I am getting the conditional counts wrong and instead counting all days. What would be the proper way to write these?

bran
  • 85
  • 2
  • 8
  • Please do not include images of your dataframes which would require someone to manually transcribe the dataframes from the images, but rather include a _small_ subset of your data as a __copyable__ piece of code that can be used for testing as well as your expected output for the __provided__ data. See [MRE - Minimal, Reproducible, Example](https://stackoverflow.com/help/minimal-reproducible-example), and [How to make good reproducible pandas examples](https://stackoverflow.com/q/20109391/15497888) for more information. – Henry Ecker Jun 12 '21 at 17:16
  • 1
    Use `d['Zero_Balance_days'] = (x['Balance'] < 0).sum() ` instead ? That is, `sum()` instead of `count()` Similarly for the other field – SeaBean Jun 12 '21 at 17:19
  • Hi @bran, and welcome to SO. Please [convert your images](https://meta.stackoverflow.com/questions/285551/why-not-upload-images-of-code-errors-when-asking-a-question) to something that’s usable − so that other users can help you ! – Cimbali Jun 12 '21 at 17:22

1 Answers1

1

You can try replace your 2 lines with .count() to .sum(), as follows:

d['Zero_Balance_days'] = (x['Balance'] < 0).sum() 
d['Over_Credit_days'] = (x['Balance'] > x['Max Credit']).sum()

.count() returns number of non-NA/null observations in the Series of boolean index while both True/False are not NA/null and will be counted as well.

.sum() returns the sum of entries of True since True is interpreted as 1 while False is interpreted as 0 in the summation.

SeaBean
  • 22,547
  • 3
  • 13
  • 25
  • @bran Please consider upvoting my answer if it proved helpful to you (see [How to upvote on Stack Overflow?](https://meta.stackexchange.com/questions/173399/how-can-i-upvote-answers-and-comments)). – SeaBean Jun 12 '21 at 17:35