0

I have table imported into python using pd.read_csv that looks as follows

dataset

I need to perform 3 activities on this table as follows

  1. Calculate the number of free apps and paid apps in each genre using group by.

I wrote following code to get the desired output `

df.groupby(['prime_genre','Subscription'])[['id']].count()`

Output:

code1_output

  1. Convert the result from 1 into a dataframe having the columns as prime_genre, free, paid and the rows having the count of each

I wrote the following code to get the desired output

df1 = df.groupby(['prime_genre','Subscription'])['id'].count().reset_index() df1.pivot_table(index='prime_genre', columns='Subscription', values='id', aggfunc='sum')

Output:

code2_output

  1. Now I need to initialize a column 'Total' that captures the sum of 'free app' and 'paid app' within the pivot table itself

  2. I also I need to initialize two more columns perc_free and perc_paid that displays the percentage of free apps and paid apps within the pivot table itself

How do I go about 3 & 4?

1 Answers1

1

Assuming the following pivot table named df2:

subscription  free app  paid app
prime_genre                     
book                66        46
business            20        37
catalogs             9         1
education          132       321

You can calculate the total using pandas.DataFrame.sum on the columns (axis=1). Then divide df2 with this total and multiply by 100 to get the percentage. You can add a suffix to the columns with pandas.DataFrame.add_suffix. Finally, combine everything with pandas.concat:

total = df2.sum(axis=1)
percent = df2.div(total, axis=0).mul(100).add_suffix(' percent')
df2['Total'] = total
pd.concat([df2, percent], axis=1)

output:

subscription  free app  paid app  Total  free app percent  paid app percent
prime_genre                                                                
book                66        46    112         58.928571         41.071429
business            20        37     57         35.087719         64.912281
catalogs             9         1     10         90.000000         10.000000
education          132       321    453         29.139073         70.860927

Here is a variant to get the perc_free / perc_paid names:

total = df2.sum(axis=1)
percent = (df2.div(total, axis=0)
              .mul(100)
              .rename(columns=lambda x: re.sub('(.*)( app)', r'perc_\1',x))
          )
df2['Total'] = total
pd.concat([df2, percent], axis=1)
subscription  free app  paid app  Total  perc_free  perc_paid
prime_genre                                                  
book                66        46    112  58.928571  41.071429
business            20        37     57  35.087719  64.912281
catalogs             9         1     10  90.000000  10.000000
education          132       321    453  29.139073  70.860927
mozway
  • 194,879
  • 13
  • 39
  • 75