Python: Pandas: percentage per column

Question

Problem
"How to pd.pivot_table with % (percentage) column where sum/ total is provided by df.pivot_table(margins=True)"

Context
We have a sample pivot:

import pandas as pd
import numpy as np

df = pd.DataFrame([["row1",50, 200],["row2",100, 300]], columns=list('ABC'))

print(df)
print("\n\n")
pivot = df.pivot_table(
    index=["A"],
    columns=[],
    values=["B", "C"],
    aggfunc={
        "B": np.sum,
        "C": np.sum,
    },
    margins=True,
    margins_name = "Total",
    fill_value=0

)

print(pivot)

Like this:

        B   C
A            
row1   50 200
row2  100 300
Total 150 500

Desired Output

        B   C   D     E
A            
row1   50 200   250   38.46%
row2  100 300   400   61.54%
Total 150 500   650   100%

In words, we essentially wish to add column E (pct of row & column total) to the output of pivot_table which is a percentage of the column total.

Note, to make the example a bit more readable we've added column 'D' which should not be part of the actual output.

Beyond that, the output format must stay like this as eventually we will output it to an excel sheet for business usage.

Tried so far
Similar questions were asked:

Add percent of total column to Pandas pivot_table
- Only does it with a groupby but as we need the columns to stay intact when we print() so that didn't really work for me
Pandas percentage of total with groupby
- Didn't work for me as we need the output format to stay intact which I haven't figured out so far

Furthermore, I was hoping that pandas has maybe found a neat way around this with the latest version so we can do it using df.pivot_table. They usually add some handy improvements to their iterations. :)

Specifications
Python: 3.5.2
Pandas: 0.18.1
Numpy: 1.11.1

score 0 · Answer 1 · answered Oct 23 '16 at 18:32

you could do something like this :

df = pd.DataFrame([["row1",50, 200],["row2",100, 300]], columns=list('ABC'))
df = df.set_index('A')
df['E'] = df.apply(lambda x: x/df.sum().sum()).sum(axis=1)
df.loc['Total'] = df.sum()
In[52]: df
Out[52]: 
           B      C         E
A                            
row1    50.0  200.0  0.384615
row2   100.0  300.0  0.615385
Total  150.0  500.0  1.000000

where

df.apply(lambda x: x/df.sum().sum())

dvided each element by the df.sum().sum() which is the sum of all elements.

.sum(axis=1)

sum every rows

and

df.loc['Total']

allows you to populate a new row with whatever you like

score 0 · Answer 2 · answered Oct 23 '16 at 18:55

Inspired by Steven G's approach, this solution worked for me:

import pandas as pd
import numpy as np

df = pd.DataFrame([["row1",50, 200],["row2",100, 300]], columns=list('ABC'))


#print(df)
print("\n\n")
pivot = df.pivot_table(
    index=["A"],
    columns=[],
    values=["B", "C"],
    aggfunc={
        "B": np.sum,
        "C": np.sum,

    },
    margins=True,
    margins_name = "Total",
    fill_value=0

)
print(pivot)

a = pd.DataFrame(pivot.ix["Total"]).transpose()["B"].values

pivot["E"] = pivot["B"].apply(lambda x: round(x/float(a), 2))

print(pivot)

OUTPUT

           B      C     E
A                        
row1    50.0  200.0  0.33
row2   100.0  300.0  0.67
Total  150.0  500.0  1.00

Python: Pandas: percentage per column

2 Answers2