1

I have one dataframe which contain many columns and i am trying to make pivot table like this

Data sample

program | InWappTable | InLeadExportTrack

VIC    |  True  | 1

VIC    | True   |1

VIC | True  |1

VIC | True  | 1

Here is my code

rec.groupby(['InWappTable', 'InLeadExportTrack','program']).size()

And Expected Output is

enter image description here

anky
  • 74,114
  • 11
  • 41
  • 70
Malik Asad
  • 441
  • 4
  • 15
  • can you provide a data sample and expected output please. – anky Feb 03 '19 at 17:35
  • I have provided sample and expected output – Malik Asad Feb 03 '19 at 17:46
  • it is advisable to paste the data as text, not images as we would not be able to copy the image. also your expected output has some rows which your input doesnot – anky Feb 03 '19 at 17:47
  • Actually i have big data set which contain many columns and thousands of rows thats why i m not able to share whole rows – Malik Asad Feb 03 '19 at 17:56
  • how `df.groupby(['program'])['InWappTable','InLeadExporttrack'].count().reset_index()` work? – anky Feb 03 '19 at 18:03
  • df.groupby(['program'])['InWappTable','InLeadExporttrack'].count().reset_index() missing the total value counts – Malik Asad Feb 03 '19 at 18:08
  • how about `df.pivot_table(index='program', values=['InWappTable', 'InLeadExporttrack'], aggfunc='count', margins=True, margins_name='Total')`? – Jacopo Repossi Feb 03 '19 at 18:26
  • @JacopoRepossi this is giving me incoorect total – Malik Asad Feb 03 '19 at 18:33
  • @MalikAsad incorrect compared to what expected result? I put 'count' as argument of `aggfunc` but could have been 'sum' or unique count.. – Jacopo Repossi Feb 03 '19 at 18:36
  • @anky_91 thanks for your time and your solution is correct but missing total that i calculated with this t_wapp=df['InWappTable'].sum() – Malik Asad Feb 03 '19 at 18:36
  • @JacopoRepossi I don't know why but aggfunc is not giving correct total rest of the value are same as groupby function – Malik Asad Feb 03 '19 at 18:40

2 Answers2

1

I do not believe that you require a pivot_table here, though a pivot_table approach with aggfunc can also be used effectively.

Here is how I approached this

Generate some data

a = [['program','InWappTable','InLeadExportTrack'],
     ['VIC',True,1],
     ['Mall',False,15],
     ['VIC',True,101],
     ['VIC',True,1],
     ['Mall',True,74],
     ['Mall',True,11],
     ['VIC',False,44]]
df = pd.DataFrame(a[1:], columns=a[0])
print(df)

  program  InWappTable  InLeadExportTrack
0     VIC         True                  1
1    Mall        False                 15
2     VIC         True                101
3     VIC         True                  1
4    Mall         True                 74
5    Mall         True                 11
6     VIC        False                 44

First do GROUP BY with count aggregation

df_grouped = df.groupby(['program']).count()
print(df_grouped)

         InWappTable  InLeadExportTrack
program                                
Mall               3                  3
VIC                4                  4

Then to get the sum of all columns

num_cols = ['InWappTable','InLeadExportTrack']
df_grouped[num_cols] = df_grouped[num_cols].astype(int)
df_grouped.loc['Total']= df_grouped.sum(axis=0)
df_grouped.reset_index(drop=False, inplace=True)
print(df_grouped)

  program  InWappTable  InLeadExportTrack
0    Mall            3                  3
1     VIC            4                  4
2   Total            7                  7

EDIT

Based on the comments in the OP, df_grouped = df.groupby(['program']).count() could be replaced by df_grouped = df.groupby(['program']).sum(). In this case, the output is shown below

  program  InWappTable  InLeadExportTrack
0    Mall            2                100
1     VIC            3                147
2   Total            5                247
edesz
  • 11,756
  • 22
  • 75
  • 123
  • 1
    why not `df.pivot_table(index='program', values=['InWappTable', 'InLeadExporttrack'], aggfunc='count', margins=True, margins_name='Total')`, which is a one line command? Am I missing something? – Jacopo Repossi Feb 03 '19 at 18:33
  • 1
    There is nothing wrong with your approach! I'll fix my wording. For me, just looking at the raw data from the OP, it seemed like `groupby` was more intuitive. Using the sample data I posted, I get the same answer using your approach as what I posted. :) – edesz Feb 03 '19 at 18:44
  • oh great! I thought I was missing something, also because in the comments above he says that `aggfunc` is not giving him the right total... – Jacopo Repossi Feb 03 '19 at 18:49
1

IIUC, you can try this:

df_new=df.groupby(['program'])['InWappTable','InLeadExporttrack'].count().reset_index()
total = df_new.sum()
total['program'] = 'Total'
df_new=df_new.append(total, ignore_index=True)
print(df_new)
anky
  • 74,114
  • 11
  • 41
  • 70