Python - trouble pivoting, grouping, and summing dataframe columns

Question

I have this code. I need to group by CustomerName and then sum the filegroups.

def consolidated_df():
    df = breakdown_df()
    df.pivot_table(index='CustomerName', columns='FileGroup', aggfunc="sum")
    return df

breakdown_df() looks like

ID    CustomerName    FileGroup    Size        Size(Bytes)
1     CustomerA       Database     99.8 M      104667648
1     CustomerA       Database     99.8 M      104667648
1     CustomerA       Backup       99.8 M      104667648
1     CustomerA       Backup       99.8 M      104667648
1     CustomerA       Site         99.8 M      104667648
1     CustomerA       Site         99.8 M      104667648
2     CustomerB       Database     99.8 M      104667648
2     CustomerB       Database     99.8 M      104667648
2     CustomerB       Backup       99.8 M      104667648
2     CustomerB       Backup       99.8 M      104667648
2     CustomerB       Site         99.8 M      104667648
2     CustomerB       Site         99.8 M      104667648

I am trying to roll it up into

ID    CustomerName    DatabaseSize    DatabaseSizeBytes    BackupSize     BackupSizeBytes     SiteSize            SiteSizeByte         TotalSize
1     CustomerA       [Total Size]    [Total Size Bytes]   [TotalSize]    [Total Size Bites]  [Total Site Size]   [Total Site Bites]   [Total Bytes for everything]
2     CustomerB       [Total Size]    [Total Size Bytes]   [TotalSize]    [Total Size Bites]  [Total Site Size]   [Total Site Bites]   [Total Bytes for everything]

I'm not so worried about actually summing Size because I can convert the bites. I just can't seem to get my pivot to work and unsure where I am going wrong.

BeRT2me · Accepted Answer · 2022-11-20T23:36:32.280

1

If you don't explicitly set values, it'll try to use all remaining columns...

out = df.pivot_table(index='CustomerName', columns='FileGroup', values='Size(Bytes)', aggfunc='sum')
print(out)

Output:

FileGroup        Backup   Database       Site
CustomerName                                 
CustomerA     209335296  209335296  209335296
CustomerB     209335296  209335296  209335296

You can also have margins if desired:

df.pivot_table(index='CustomerName', 
               columns='FileGroup', 
               values='Size(Bytes)', 
               aggfunc='sum', 
               margins=True, 
               margins_name='TotalSize').drop('TotalSize')

# Output:

FileGroup        Backup   Database       Site  TotalSize
CustomerName                                            
CustomerA     209335296  209335296  209335296  628005888
CustomerB     209335296  209335296  209335296  628005888

edited Nov 20 '22 at 23:36

answered Nov 20 '22 at 23:32

BeRT2me

12,699
2
13
31

ID can't be an index because it won't always be present. I'll test the above and also `df.pivot_table(index='FileGroup',aggfunc='sum')`. – uncrayon Nov 20 '22 at 23:35
@DuckButts Updated~ – BeRT2me Nov 20 '22 at 23:38
It works, thanks. I just need to navigate how the columns are. Instead of `FileGroup, Backup, Database, Site at the very top - I need it to just be like CustomerName, Backup, Database, Site as the headings. Thanks again! – uncrayon Nov 20 '22 at 23:49
`FileGroup` is your `df.columns.name` attribute which you can set to `None` if desired, and you can use `.reset_index()` to move `CustomerName` from your index to your columns~ – BeRT2me Nov 20 '22 at 23:52
I got it and came back to say it but you beat me - `red = df.pivot_table(index=['AccountExternalID','CustomerName'], columns='FileGroup', values='Size (Bytes)', aggfunc='sum').reset_index()` Thanks! – uncrayon Nov 20 '22 at 23:53
I just noticed something - this isn't summing - it is concatenating the bytes together. So like `104667648104667648`. – uncrayon Nov 21 '22 at 00:01
Then your bytes are incorrectly formatted as strings. Use `df['Size(Bytes)'] = pd.to_numeric(df['Size(Bytes)'])` to convert them~ – BeRT2me Nov 21 '22 at 00:02

score 1 · Answer 2 · answered Nov 20 '22 at 23:40

With help of another StackOverflow answer:

# https://stackoverflow.com/a/23773174/10035985
def sizeof_fmt(num, use_kibibyte=True):
    base, suffix = [(1000.0, "B"), (1024.0, "iB")][use_kibibyte]
    for x in ["B", *map(lambda x: x + suffix, list("kMGTP"))]:
        if -base < num < base:
            return "%3.1f %s" % (num, x)
        num /= base
    return "%3.1f %s" % (num, x)


x = df.pivot_table(
    index=["ID", "CustomerName"],
    columns=["FileGroup"],
    values="Size(Bytes)",
    aggfunc="sum",
)

columns_to_convert = list(x.columns)
x = pd.concat([x, x.add_suffix("SizeBytes")], axis=1)

x["TotalSize"] = x[columns_to_convert].sum(axis=1).apply(sizeof_fmt)

for c in columns_to_convert:
    x[c] = x[c].apply(sizeof_fmt)

x.columns.name, x.index.name = None, None
print(x.reset_index())

Prints:

   ID CustomerName     Backup   Database       Site  BackupSizeBytes  DatabaseSizeBytes  SiteSizeBytes  TotalSize
0   1    CustomerA  199.6 MiB  199.6 MiB  199.6 MiB        209335296          209335296      209335296  598.9 MiB
1   2    CustomerB  199.6 MiB  199.6 MiB  199.6 MiB        209335296          209335296      209335296  598.9 MiB

Python - trouble pivoting, grouping, and summing dataframe columns

2 Answers2