0

I'd like to calculate subtotals on this dataset:

Country  Channel    Genre  Size   Prev
UK       Partners   Blues    25     20
UK       Stores     Blues    15     10
UK       Stores     Rock     35     30
US       Stores     Rock     45     40
UK       Partners   Rock     55     50
US       Partners   Rock     65     60
UK       Stores     Blues     5      2

This solution calculates subtotals, but I need each level sorted differently. Specifically:

>>> columns = ['Country', 'Channel', 'Genre']
>>> sort = {'Country': 'Country', 'Channel': 'Size', 'Genre': 'Prev'}
>>> subtotal(data, columns, agg='sum', sort=sort)
    Country  Channel    Genre  Size  Prev
0   UK       Stores     Blues   20    12
1   UK       Stores     Rock    35    30
2   UK       Stores             55    42
3   UK       Partners   Blues   25    20
4   UK       Partners   Rock    55    50
5   UK       Partners           80    70
6   UK                         135   112
7   US       Stores     Rock    45    40
8   US       Stores             45    40
9   US       Partners   Rock    65    60
10  US       Partners           65    60
11  US                         110   100
12                             245   212

In this:

  • The "Country" column is sorted in ascending order of Country name (UK before US)
  • The "Channel" column is sorted (within Country) in ascending order of Size (UK Stores = 55 before UK Partners = 80)
  • The "Genre" column is sorted (within Country & Genre) in ascending order of Prev (UK Stores Blues = 20 before UK Stores Rock = 35)

How can we do this efficiently?

S Anand
  • 11,364
  • 2
  • 28
  • 23

1 Answers1

0

It is a very primitive step-by-step script, but after aggregating the data for duplicate attributes, it adds the total sum. It then combines and sorts the grouped rows for each attribute.

df = df.groupby(['Country','Channel','Genre'], as_index=False)[['Size','Prev']].sum()
df.loc['all_total'] = df.sum(numeric_only=True, axis=0)
sub1 = df.groupby('Country', as_index=False)[['Size','Prev']].sum()
sub2 = df.groupby(['Country','Channel'], as_index=False)[['Size','Prev']].sum()
all_total = pd.concat([df, sub1, sub2], axis=0)
all_total.sort_values(['Country','Channel'], ascending=[True,False],axis=0).fillna('')

    Country     Channel     Genre   Size    Prev
2   UK  Stores  Blues      20.0     12.0
3   UK  Stores  Rock       35.0     30.0
1   UK  Stores             55.0     42.0
0   UK  Partners    Blues   25.0    20.0
1   UK  Partners    Rock    55.0    50.0
0   UK  Partners           80.0     70.0
0   UK                    135.0     112.0
5   US  Stores  Rock      45.0  40.0
3   US  Stores            45.0  40.0
4   US  Partners    Rock    65.0    60.0
2   US  Partners           65.0     60.0
1   US                     110.0    100.0
all_total                  245.0    212.0
r-beginners
  • 31,170
  • 3
  • 14
  • 32