0
import pandas as pd

df = pd.DataFrame({
       'customer': [1,2,1,3,1,2,3],
       "group_code": ['111', '111', '222', '111', '111', '111', '333'],
       "ind_code": ['A', 'B', 'AA', 'A', 'AAA', 'C', 'BBB'],
       "amount": [100, 200, 140, 400, 225, 125, 600],
       "card": ['XXX', 'YYY', 'YYY', 'XXX', 'XXX', 'YYY', 'XXX']})

With the above data frame , I wanted the output as below :

For each card number, I wanted the below records :

Card number, % of Amount spent of Group code 1, % of Amount spent on Group code 2, ….so on for different Group code

% of Amount spent on any group = (Total amount spend on the card / Amount spend on that group ) * 100

Also, on larger picture, I wanted to know the Top 5 Groups for each card where the amount is spent ?

It's basically 2 queries , It will be great if anyone can help me.

Note : The code given is just for understanding how my data frame looks like.

rdj7
  • 1,905
  • 4
  • 18
  • 33
Aysha
  • 1
  • 2

1 Answers1

0

Regarding the first query: first we get the total amount spent for each card:

card_totals = df.groupby('card').sum()['amount'].reset_index().to_dict(orient='list')
card_totals_dict = dict(zip(card_totals['card'], card_totals['amount']))
card_totals_dict

Output:

{'XXX': 1325, 'YYY': 465}

Then we calculate the percentage for each group:

group_percentage = df.groupby(['card', 'group_code']).sum()['amount'].reset_index()
group_percentage['percentage'] = group_percentage['amount'] * 100 / group_percentage['card'].apply(card_totals_dict.get)
group_percentage

Output:

card group_code  amount  percentage
0  XXX        111     725    54.7170
1  XXX        333     600    45.2830
2  YYY        111     325    69.8925
3  YYY        222     140    30.1075

Regarding the second query, it sounds very similar to this question, so I would say:

df.groupby(['card', 'group_code']).agg({'amount': sum})['amount'].groupby(level=0, group_keys=False).nlargest(5)

Using nlargest(1) returns

card  group_code
XXX   111           725
YYY   111           325
Name: amount, dtype: int64
andersource
  • 799
  • 3
  • 9