Initializing new columns in pd.pivot_table using existing columns within pd.pivot_table

Question

I have table imported into python using pd.read_csv that looks as follows

I need to perform 3 activities on this table as follows

Calculate the number of free apps and paid apps in each genre using group by.

I wrote following code to get the desired output `

df.groupby(['prime_genre','Subscription'])[['id']].count()`

Output:

Convert the result from 1 into a dataframe having the columns as prime_genre, free, paid and the rows having the count of each

I wrote the following code to get the desired output

df1 = df.groupby(['prime_genre','Subscription'])['id'].count().reset_index() df1.pivot_table(index='prime_genre', columns='Subscription', values='id', aggfunc='sum')

Output:

Now I need to initialize a column 'Total' that captures the sum of 'free app' and 'paid app' within the pivot table itself
I also I need to initialize two more columns perc_free and perc_paid that displays the percentage of free apps and paid apps within the pivot table itself

How do I go about 3 & 4?

Please take a while to read: [How to make good reproducible pandas examples](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) — Corralien, Jul 25 '21 at 16:19
@Anurag Dabas will this work for columns? I thought the margins would be for the rows' total. — mozway, Jul 25 '21 at 16:40

mozway · Accepted Answer · 2021-07-25T16:48:29.740

Assuming the following pivot table named df2:

subscription  free app  paid app
prime_genre                     
book                66        46
business            20        37
catalogs             9         1
education          132       321

You can calculate the total using pandas.DataFrame.sum on the columns (axis=1). Then divide df2 with this total and multiply by 100 to get the percentage. You can add a suffix to the columns with pandas.DataFrame.add_suffix. Finally, combine everything with pandas.concat:

total = df2.sum(axis=1)
percent = df2.div(total, axis=0).mul(100).add_suffix(' percent')
df2['Total'] = total
pd.concat([df2, percent], axis=1)

output:

subscription  free app  paid app  Total  free app percent  paid app percent
prime_genre                                                                
book                66        46    112         58.928571         41.071429
business            20        37     57         35.087719         64.912281
catalogs             9         1     10         90.000000         10.000000
education          132       321    453         29.139073         70.860927

Here is a variant to get the perc_free / perc_paid names:

total = df2.sum(axis=1)
percent = (df2.div(total, axis=0)
              .mul(100)
              .rename(columns=lambda x: re.sub('(.*)( app)', r'perc_\1',x))
          )
df2['Total'] = total
pd.concat([df2, percent], axis=1)

subscription  free app  paid app  Total  perc_free  perc_paid
prime_genre                                                  
book                66        46    112  58.928571  41.071429
business            20        37     57  35.087719  64.912281
catalogs             9         1     10  90.000000  10.000000
education          132       321    453  29.139073  70.860927

If I have to rename 'free app percent' as 'perc_free' and 'paid app percent' as 'perc_paid'. Then what changes shall I make? — Apekshit Dhoke, Jul 25 '21 at 16:57
Please read the latest version of the answer. I provided this use case. — mozway, Jul 25 '21 at 16:59

Initializing new columns in pd.pivot_table using existing columns within pd.pivot_table

1 Answers1