0

Problem
"How to pd.pivot_table with % (percentage) column where sum/ total is provided by df.pivot_table(margins=True)"

Context
We have a sample pivot:

import pandas as pd
import numpy as np

df = pd.DataFrame([["row1",50, 200],["row2",100, 300]], columns=list('ABC'))

print(df)
print("\n\n")
pivot = df.pivot_table(
    index=["A"],
    columns=[],
    values=["B", "C"],
    aggfunc={
        "B": np.sum,
        "C": np.sum,
    },
    margins=True,
    margins_name = "Total",
    fill_value=0

)

print(pivot)

Like this:

        B   C
A            
row1   50 200
row2  100 300
Total 150 500


Desired Output

        B   C   D     E
A            
row1   50 200   250   38.46%
row2  100 300   400   61.54%
Total 150 500   650   100%

In words, we essentially wish to add column E (pct of row & column total) to the output of pivot_table which is a percentage of the column total.

Note, to make the example a bit more readable we've added column 'D' which should not be part of the actual output.

Beyond that, the output format must stay like this as eventually we will output it to an excel sheet for business usage.

Tried so far
Similar questions were asked:


Furthermore, I was hoping that pandas has maybe found a neat way around this with the latest version so we can do it using df.pivot_table. They usually add some handy improvements to their iterations. :)

Specifications
Python: 3.5.2
Pandas: 0.18.1
Numpy: 1.11.1

Community
  • 1
  • 1
John
  • 141
  • 2
  • 6

2 Answers2

0

you could do something like this :

df = pd.DataFrame([["row1",50, 200],["row2",100, 300]], columns=list('ABC'))
df = df.set_index('A')
df['E'] = df.apply(lambda x: x/df.sum().sum()).sum(axis=1)
df.loc['Total'] = df.sum()
In[52]: df
Out[52]: 
           B      C         E
A                            
row1    50.0  200.0  0.384615
row2   100.0  300.0  0.615385
Total  150.0  500.0  1.000000

where

df.apply(lambda x: x/df.sum().sum())

dvided each element by the df.sum().sum() which is the sum of all elements.

.sum(axis=1)

sum every rows

and

df.loc['Total']

allows you to populate a new row with whatever you like

Steven G
  • 16,244
  • 8
  • 53
  • 77
0

Inspired by Steven G's approach, this solution worked for me:

import pandas as pd
import numpy as np

df = pd.DataFrame([["row1",50, 200],["row2",100, 300]], columns=list('ABC'))


#print(df)
print("\n\n")
pivot = df.pivot_table(
    index=["A"],
    columns=[],
    values=["B", "C"],
    aggfunc={
        "B": np.sum,
        "C": np.sum,

    },
    margins=True,
    margins_name = "Total",
    fill_value=0

)
print(pivot)

a = pd.DataFrame(pivot.ix["Total"]).transpose()["B"].values

pivot["E"] = pivot["B"].apply(lambda x: round(x/float(a), 2))

print(pivot)


OUTPUT

           B      C     E
A                        
row1    50.0  200.0  0.33
row2   100.0  300.0  0.67
Total  150.0  500.0  1.00
John
  • 141
  • 2
  • 6