0

My current dataframe looks like this:

df0:

reqs    code    hostname    file_path   filename    extension   date
51723330    404 services.compay.com /folderA/folderB/   JPG     2018-09-13 
50927945    404 services.company2.com   /folderA/folderB/   GIF     2018-09-15 
50781228    404 services.companyB.com   /folderA/folderB/   JPG     2018-09-14 
50554338    404 services.companyC.com   /folderA/folderB/...    

What I would like to do is end up with a table like this where there is a column that is the % of requests (%reqs) based on the reqs count

    reqs    code    hostname    file_path   filename    extension   date        %reqs
    51723330    404 services.compay.com /folderA/folderB/   JPG     2018-09-13  12%
    50927945    404 services.company2.com   /folderA/folderB/   GIF     2018-09-15  10%
    50781228    404 services.companyB.com   /folderA/folderB/   JPG     2018-09-14  11%
    50554338    404 services.companyC.com   /folderA/folderB/...                    10%
...
..
.

I tried to follow this and got a little lost: Pandas percentage of total with groupby

df1 = df0.groupby(['code','hostname','file_path','filename','file_extension','date']).agg({'reqs': 'sum'})
df2 = df1.groupby(level=0).apply(lambda x: 100* x/float(x.sum()))

Doesnt look like % were represented also I think I need a step where I once I get the % I need to merge it back in to df0 This produced some strange results.

chowpay
  • 1,515
  • 6
  • 22
  • 44
  • How exactly did you get the `%reqs` values in your example? For example, row 1 has 12% in it. Does that mean that those 51723330 requests are 12% of all requests in the group defined by (code, hostname, file_path, filename, extension, date) together? – Peter Leimbigler Sep 21 '18 at 00:07
  • I made those numbers up but yah 51723330 would be "12%" of the total out of all the reqs if you were to sum it up – chowpay Sep 21 '18 at 00:12

0 Answers0