1

Dataframe:

df = {'Client': ['A', 'A', 'A', 'B', 'B', 'B', 'B','B','B','B','C','D','D','D','D','D','D','D','D','D','D','D' ],
'Result': ['Covered', 'Customer Reject', 'Customer Timeout', 'Dealer Reject','Dealer Timeout','Done','Tied Covered','Tied Done','Tied Traded Away','Traded Away','No RFQ','Covered','Customer Reject','Customer Timeout','Dealer Reject','Dealer Timeout','Done','Tied Covered','Tied Done','Tied Traded Away','Traded Away','No RFQ']}
df = pd.DataFrame.from_dict(df)
print(df)
  
   Client            Result
0       A           Covered
1       A   Customer Reject
2       A  Customer Timeout
3       B     Dealer Reject
4       B    Dealer Timeout
5       B              Done
6       B      Tied Covered
7       B         Tied Done
8       B  Tied Traded Away
9       B       Traded Away
10      C            No RFQ
11      D           Covered
12      D   Customer Reject
13      D  Customer Timeout
14      D     Dealer Reject
15      D    Dealer Timeout
16      D              Done
17      D      Tied Covered
18      D         Tied Done
19      D  Tied Traded Away
20      D       Traded Away
21      D            No RFQ

Current Output:

df = df.groupby(['Client','Result']).agg({'Result': 'size'})
print(df)


                     Result
Client Result                  
A      Covered                1
       Customer Reject        1
       Customer Timeout       1
B      Dealer Reject          1
       Dealer Timeout         1
       Done                   1
       Tied Covered           1
       Tied Done              1
       Tied Traded Away       1
       Traded Away            1
C      No RFQ                 1
D      Covered                1
       Customer Reject        1
       Customer Timeout       1
       Dealer Reject          1
       Dealer Timeout         1
       Done                   1
       No RFQ                 1
       Tied Covered           1
       Tied Done              1
       Tied Traded Away       1
       Traded Away            1

Desired Output:

a. Space between groupings

b. Totals for each grouping

c. All non present Result for each client has a zero

Note the total possible values in Result is as follows (11 strings). These may or may not be present in the current dataset for the month:

Covered
Customer Reject
Customer Timeout
Dealer Reject
Dealer Timeout
Done
Tied Covered
Tied Done
Tied Traded Away
Traded Away
No RFQ

Client Result              Count
A      Covered             1
A      Customer Reject     1
A      Customer Timeout    1
A      Dealer Reject       0
A      Dealer Timeout      0
A      Done                0
A      Tied Covered        0
A      Tied Done           0
A      Tied Traded Away    0
A      Traded Away         0
A      No RFQ              0
Total                      3

Client Result              Count        
B      Covered             0
B      Customer Reject     0
B      Customer Timeout    0
B      Dealer Reject       1
B      Dealer Timeout      1
B      Done                1
B      Tied Covered        1
B      Tied Done           1
B      Tied Traded Away    1
B      Traded Away         1
A      No RFQ              1
Total                      8            
        
Client Result              Count        
C      Covered             0
C      Customer Reject     0
C      Customer Timeout    0
C      Dealer Reject       0
C      Dealer Timeout      0
C      Done                0
C      Tied Covered        0
C      Tied Done           0
C      Tied Traded Away    0
C      Traded Away         0
C      No RFQ              1
Total                      1                        

Client Result              Count        
D      Covered             1
D      Customer Reject     1
D      Customer Timeout    1
D      Dealer Reject       1
D      Dealer Timeout      1
D      Done                1
D      Tied Covered        1
D      Tied Done           1
D      Tied Traded Away    1
D      Traded Away         1
D      No RFQ              1
Total                      11                       
Peter Lucas
  • 1,979
  • 1
  • 16
  • 27

4 Answers4

1

Use .reindex with pd.MultiIndex.from_product:

Setup:

df = {'Client': ['A', 'A', 'A', 'B', 'B', 'B', 'B','B','B','B','C','D','D','D','D','D','D','D','D','D','D','D' ],
'Result': ['Covered', 'Customer Reject', 'Customer Timeout', 'Dealer Reject','Dealer Timeout','Done','Tied Covered','Tied Done','Tied Traded Away','Traded Away','No RFQ','Covered','Customer Reject','Customer Timeout','Dealer Reject','Dealer Timeout','Done','Tied Covered','Tied Done','Tied Traded Away','Traded Away','No RFQ']}
df = pd.DataFrame.from_dict(df)

Code:

cols = ['Client', 'Result']
i = [df['Client'].unique(), df['Result'].unique()]
df = df.groupby(cols)['Result'].size().rename('Count').reset_index().set_index(cols)
df = df.reindex(index=pd.MultiIndex.from_product(i, names=cols), fill_value=0).reset_index()
df
Out[1]: 
   Client            Result  Count
0       A           Covered      1
1       A   Customer Reject      1
2       A  Customer Timeout      1
3       A     Dealer Reject      0
4       A    Dealer Timeout      0
5       A              Done      0
6       A      Tied Covered      0
7       A         Tied Done      0
8       A  Tied Traded Away      0
9       A       Traded Away      0
10      A            No RFQ      0
11      B           Covered      0
12      B   Customer Reject      0
13      B  Customer Timeout      0
14      B     Dealer Reject      1
15      B    Dealer Timeout      1
16      B              Done      1
17      B      Tied Covered      1
18      B         Tied Done      1
19      B  Tied Traded Away      1
20      B       Traded Away      1
21      B            No RFQ      0
22      C           Covered      0
23      C   Customer Reject      0
24      C  Customer Timeout      0
25      C     Dealer Reject      0
26      C    Dealer Timeout      0
27      C              Done      0
28      C      Tied Covered      0
29      C         Tied Done      0
30      C  Tied Traded Away      0
31      C       Traded Away      0
32      C            No RFQ      1
33      D           Covered      1
34      D   Customer Reject      1
35      D  Customer Timeout      1
36      D     Dealer Reject      1
37      D    Dealer Timeout      1
38      D              Done      1
39      D      Tied Covered      1
40      D         Tied Done      1
41      D  Tied Traded Away      1
42      D       Traded Away      1
43      D            No RFQ      1

To display the results:

for c in df['Client'].unique():
    x = df[df['Client'] == c]['Count'].sum()
    print(df[df['Client'] == c])
    print(f'Total {x}')
    print('')

      Client            Result  Count
0       A           Covered      1
1       A   Customer Reject      1
2       A  Customer Timeout      1
3       A     Dealer Reject      0
4       A    Dealer Timeout      0
5       A              Done      0
6       A      Tied Covered      0
7       A         Tied Done      0
8       A  Tied Traded Away      0
9       A       Traded Away      0
10      A            No RFQ      0
Total 3

   Client            Result  Count
11      B           Covered      0
12      B   Customer Reject      0
13      B  Customer Timeout      0
14      B     Dealer Reject      1
15      B    Dealer Timeout      1
16      B              Done      1
17      B      Tied Covered      1
18      B         Tied Done      1
19      B  Tied Traded Away      1
20      B       Traded Away      1
21      B            No RFQ      0
Total 7

   Client            Result  Count
22      C           Covered      0
23      C   Customer Reject      0
24      C  Customer Timeout      0
25      C     Dealer Reject      0
26      C    Dealer Timeout      0
27      C              Done      0
28      C      Tied Covered      0
29      C         Tied Done      0
30      C  Tied Traded Away      0
31      C       Traded Away      0
32      C            No RFQ      1
Total 1

   Client            Result  Count
33      D           Covered      1
34      D   Customer Reject      1
35      D  Customer Timeout      1
36      D     Dealer Reject      1
37      D    Dealer Timeout      1
38      D              Done      1
39      D      Tied Covered      1
40      D         Tied Done      1
41      D  Tied Traded Away      1
42      D       Traded Away      1
43      D            No RFQ      1
Total 11
David Erickson
  • 16,433
  • 2
  • 19
  • 35
1

You can use pandas.get_dummies followed by .groupby:

df = df.set_index('Client')
df = pd.get_dummies(df, prefix='', prefix_sep='').groupby(df.index).apply(np.sum).T

print(df)

Prints:

Client            A  B  C  D
Covered           1  0  0  1
Customer Reject   1  0  0  1
Customer Timeout  1  0  0  1
Dealer Reject     0  1  0  1
Dealer Timeout    0  1  0  1
Done              0  1  0  1
No RFQ            0  0  1  1
Tied Covered      0  1  0  1
Tied Done         0  1  0  1
Tied Traded Away  0  1  0  1
Traded Away       0  1  0  1

To pretty print the data:

def print_group(data, client):
    df = pd.DataFrame({'Result': data.index, 'Count': data.values.tolist()})
    df['Client'] = client
    df = df.set_index('Client')
    print(df[['Result', 'Count']])
    print('Total {}'.format(np.sum(df['Count'])))
    print()


df = df.set_index('Client')
df = pd.get_dummies(df, prefix='', prefix_sep='').groupby(df.index).apply(np.sum).T
    
for c in df.columns:
    print_group(df[c], c)

Prints:

                  Result  Count
Client                         
A                Covered      1
A        Customer Reject      1
A       Customer Timeout      1
A          Dealer Reject      0
A         Dealer Timeout      0
A                   Done      0
A                 No RFQ      0
A           Tied Covered      0
A              Tied Done      0
A       Tied Traded Away      0
A            Traded Away      0
Total 3

                  Result  Count
Client                         
B                Covered      0
B        Customer Reject      0
B       Customer Timeout      0
B          Dealer Reject      1
B         Dealer Timeout      1
B                   Done      1
B                 No RFQ      0
B           Tied Covered      1
B              Tied Done      1
B       Tied Traded Away      1
B            Traded Away      1
Total 7

                  Result  Count
Client                         
C                Covered      0
C        Customer Reject      0
C       Customer Timeout      0
C          Dealer Reject      0
C         Dealer Timeout      0
C                   Done      0
C                 No RFQ      1
C           Tied Covered      0
C              Tied Done      0
C       Tied Traded Away      0
C            Traded Away      0
Total 1

                  Result  Count
Client                         
D                Covered      1
D        Customer Reject      1
D       Customer Timeout      1
D          Dealer Reject      1
D         Dealer Timeout      1
D                   Done      1
D                 No RFQ      1
D           Tied Covered      1
D              Tied Done      1
D       Tied Traded Away      1
D            Traded Away      1
Total 11
Andrej Kesely
  • 168,389
  • 15
  • 48
  • 91
1

To create the subtotals, I use the same method as in Transform pandas groupby result with subtotals to relative values.

There is probably another method to create the index for the entries that are not present in the original data, but pivot_table does the trick.

data = {'Client': ['A', 'A', 'A', 'B', 'B', 'B', 'B','B','B','B','C','D','D','D','D','D','D','D','D','D','D','D' ],
'Result': ['Covered', 'Customer Reject', 'Customer Timeout', 'Dealer Reject','Dealer Timeout','Done','Tied Covered','Tied Done','Tied Traded Away','Traded Away','No RFQ','Covered','Customer Reject','Customer Timeout','Dealer Reject','Dealer Timeout','Done','Tied Covered','Tied Done','Tied Traded Away','Traded Away','No RFQ']}

df = pd.DataFrame.from_dict(data)
# create a pivot_table and fill the empty sites with "0", the unstack again
# and group as required.
pvt = df.pivot_table(index='Client', columns = 'Result', aggfunc='size').fillna(0).unstack().groupby(by=['Client', 'Result']).agg(sum).reset_index()
print(pvt)

# calculate the subtotals
iList = ["Client", "Result"]
pvt2 = pd.concat(
    [pvt.assign(**{x: "" for x in iList[i:]}).groupby(iList).sum() for i in range(1, 3)]
).sort_index().reset_index()

# pretty-print the result
for c in pvt['Client'].drop_duplicates():
    print(pvt2.loc[pvt2['Client']==c].iloc[1:])
    print(f"Total: {pvt2.loc[pvt2['Client']==c].iloc[0].values[2]:>10}")
    print()

which yields

   Client            Result    0
1       A           Covered  1.0
2       A   Customer Reject  1.0
3       A  Customer Timeout  1.0
4       A     Dealer Reject  0.0
5       A    Dealer Timeout  0.0
6       A              Done  0.0
7       A            No RFQ  0.0
8       A      Tied Covered  0.0
9       A         Tied Done  0.0
10      A  Tied Traded Away  0.0
11      A       Traded Away  0.0
Total:        3.0

   Client            Result    0
13      B           Covered  0.0
14      B   Customer Reject  0.0
15      B  Customer Timeout  0.0
16      B     Dealer Reject  1.0
17      B    Dealer Timeout  1.0
18      B              Done  1.0
19      B            No RFQ  0.0
20      B      Tied Covered  1.0
21      B         Tied Done  1.0
22      B  Tied Traded Away  1.0
23      B       Traded Away  1.0
Total:        7.0

   Client            Result    0
25      C           Covered  0.0
26      C   Customer Reject  0.0
27      C  Customer Timeout  0.0
28      C     Dealer Reject  0.0
29      C    Dealer Timeout  0.0
30      C              Done  0.0
31      C            No RFQ  1.0
32      C      Tied Covered  0.0
33      C         Tied Done  0.0
34      C  Tied Traded Away  0.0
35      C       Traded Away  0.0
Total:        1.0

   Client            Result    0
37      D           Covered  1.0
38      D   Customer Reject  1.0
39      D  Customer Timeout  1.0
40      D     Dealer Reject  1.0
41      D    Dealer Timeout  1.0
42      D              Done  1.0
43      D            No RFQ  1.0
44      D      Tied Covered  1.0
45      D         Tied Done  1.0
46      D  Tied Traded Away  1.0
47      D       Traded Away  1.0
Total:       11.0

I am adding the empty lines through the print. Adding empty records in the dataframe does not seem to be a good idea, because they will go away when re-order.

Similarly, changing the record for the total rows to "Total" would be problematic because you do not know which group they belong to once the dataframe gets reordered.

Slightly modified code without the pivot_table:

import pandas as pd

data = {'Client': ['A', 'A', 'A', 'B', 'B', 'B', 'B','B','B','B','C','D','D','D','D','D','D','D','D','D','D','D' ],
'Result': ['Covered', 'Customer Reject', 'Customer Timeout', 'Dealer Reject','Dealer Timeout','Done','Tied Covered','Tied Done','Tied Traded Away','Traded Away','No RFQ','Covered','Customer Reject','Customer Timeout','Dealer Reject','Dealer Timeout','Done','Tied Covered','Tied Done','Tied Traded Away','Traded Away','No RFQ']}

df = pd.DataFrame.from_dict(data)

cols = ["Client", "Result"]
ind1 =['A', 'B', 'C', 'D']
ind2 = ["Covered",
"Customer Reject",
"Customer Timeout",
"Dealer Reject",
"Dealer Timeout",
"Done",
"Tied Covered",
"Tied Done",
"Tied Traded Away",
"Traded Away",
"No RFQ"]

idx = pd.MultiIndex.from_product([ind1, ind2], names=cols)
pvt = df.groupby(cols).size().reindex(idx).fillna(0).reset_index()

pvt2 = pd.concat(
    [pvt.assign(**{x: "" for x in cols[i:]}).groupby(cols).sum() for i in range(1, 3)]
).sort_index().reset_index()

# print(pvt2)

for c in pvt['Client'].drop_duplicates():
    print(pvt2.loc[pvt2['Client']==c].iloc[1:])
    print(f"Total: {pvt2.loc[pvt2['Client']==c].iloc[0].values[2]:>10}")
    print()
divingTobi
  • 2,044
  • 10
  • 25
0

The solution could be melt and pivot:

df = {'Client': ['A', 'A', 'A', 'B', 'B', 'B', 'B','B','B','B','C','D','D','D','D','D','D','D','D','D','D','D' ],
'Result': ['Covered', 'Customer Reject', 'Customer Timeout', 'Dealer Reject','Dealer Timeout','Done','Tied Covered','Tied Done','Tied Traded Away','Traded Away','No RFQ','Covered','Customer Reject','Customer Timeout','Dealer Reject','Dealer Timeout','Done','Tied Covered','Tied Done','Tied Traded Away','Traded Away','No RFQ']}
df = pd.DataFrame.from_dict(df)

groups = pd.melt((
    df
    .assign(Count=1)
    .groupby(['Client','Result'],as_index=False)
    .sum()
    .pivot(index='Client', columns='Result', values='Count')
    .fillna(0)
    .reset_index()
    )
    ,id_vars=['Client']).rename(columns={'value':'Count'}).sort_values(['Client','Count'])

for i,rows in groups.groupby('Client').sum().iterrows():
    print(groups[lambda x: x['Client']==i].set_index('Client'))
    print('Total\t\t\t ',rows['Count'])
    print('\n')
Mehdi Golzadeh
  • 2,594
  • 1
  • 16
  • 28