0

Morning All,

Initial data:

df ={'Trading_Book': ['A', 'A', 'B', 'B','C','C','C',],
'Client': ['AA', 'AA', 'BB', 'BB','CC','CC','CC',],
'Ccy':[ 'AUD', 'AUD', 'AUD', 'AUD','AUD','AUD','AUD',],
'RFQ_State': ['Done', 'Done', 'Done', 'Traded Away','Traded Away', 'Traded Away','Done'],
'Competing_Dealers': [2, 4, 6, 8, 10, 12,14],
'Notional': [1000, 1000, 1000, 1000,1000,1000,1000],
        }

print(df)

Trading_Book    Client  Ccy RFQ_State   Competing_Dealers   Notional
A               AA      AUD Done        2                   1000
A               AA      AUD Done        4                   1000
B               BB      AUD Done        6                   1000
B               BB      AUD Traded Away 8                   1000
C               CC      AUD Traded Away 10                  1000
C               CC      AUD Traded Away 12                  1000
C               CC      AUD Done        14                  1000

My end state is an Excel report with Totals at the top and grouped By columns

Grand Totals                7           8                       7,000       4,000           0.571429 
Trading_Book    Client  Ccy Total_RFQs  Avg_Competing_Dealers   Notional    Notional_Done   Hit_Rate
    A               AA  AUD 2           3                       2,000       2,000           1
    B               BB  AUD 2           7                       2,000       1,000           0.5
    C               CC  AUD 3           12                      3,000       1,000           0.3333

My attempt:

a) Add Total Columns i.e. Total RFQ, Total Notional Done (if state ='Done' populate) and Hit Rates (Notional_Done / Notional) per Trading_Book, Client, Ccy grouping:

# Add the Notional where state = done
mask = df['RFQ_State'].str.contains('Done')
df.loc[mask, 'Notional_Done'] = df['Notional']
# Populate any nulls due to 0/0
df = df.fillna(0)
# Column for Done RFQ's used in Hit Rate
df['Done_RFQ'] = np.where(df['RFQ_State'].str.contains('Done'), 1, 0)
# Hit_Rate = Done Notional / Total Notional
df['Hit_Rate'] = (df['Notional_Done'] / df['Notional'])
df.to_csv('C:\EOM_ML_201901\outputs\Scratchpad2' + '.csv')

b) Group By

df_Detail = df.groupby(['Trading_Book', 'Client', 'Ccy']).agg({'RFQ_State': 'count',
                                                          'Competing_Dealers': 'mean',
                                                          'Notional': 'sum',
                                                          'Notional_Done': 'sum',
                                                        }).reset_index()
df_Detail = df_Detail.sort_values(['Trading_Book', 'Client'], ascending=[True, True])
df_Detail.rename(columns={
    'RFQ_State': 'Total_RFQs',
    'Competing_Dealers': 'Avg_Competing_Dealers'
}, inplace=True)
# Hit_Rate = Done Notional / Total Notional
df_Detail['Hit_Rate'] = (df_Detail['Notional_Done'] / df_Detail['Notional'])

print(df_Detail)
df_Detail.to_csv('C:\EOM_ML_201901\outputs\Scratchpad2' + '.csv')

Trading_Book    Client  Ccy Total_RFQs  Avg_Competing_Dealers   Notional    Notional_Done
    A               AA      AUD 2           3                       2000        2000
    B               BB      AUD 2           7                       2000        1000
    C               CC      AUD 3           12                      3000        0.3333333

c) Find the Grand Totals for each of these grouped by columns based on the original data:

df_Summay = df.groupby(['Ccy']).agg({'RFQ_State': 'count',
                                                          'Competing_Dealers': 'mean',
                                                          'Notional': 'sum',
                                                          'Notional_Done': 'sum',
                                                        }).reset_index()
df_Summay.rename(columns={
    'RFQ_State': 'Total_RFQs',
    'Competing_Dealers': 'Avg_Competing_Dealers'
}, inplace=True)
# Hit_Rate = Done Notional / Total Notional
df_Summay['Hit_Rate'] = (df_Summay['Notional_Done'] / df_Summay['Notional'])
df_Summay.to_csv('C:\EOM_ML_201901\outputs\Scratchpad3' + '.csv')
print(df_Summay)

    Ccy Total_RFQs  Avg_Competing_Dealers   Notional    Notional_Done   Hit_Rate
    AUD 7           8                       7000        4000            0.571429 

Issues:

  1. There may be many currencies in the underlying data so need to solve how to return only the Grand Totals from the original data without the grouping (AUD).
  2. Place these totals above the df_Detail and export. This is like an append of the two dataframes i.e. df_Summary (less the headers and AUD value) stacked on top of df_Detail
Peter Lucas
  • 1,979
  • 1
  • 16
  • 27
  • `There may be many currencies in the underlying data so need to solve how to return only the Grand Totals from the original data without the grouping (AUD).` - Not sure if understand, need filter only rows with `AUD` in first step before your solutions? – jezrael May 24 '21 at 07:47
  • I just need the Grand totals. To do this I grouped by the only column which is most likely to have one unique value in the dataframe. – Peter Lucas May 24 '21 at 08:07
  • Was thinking there may be a way to do this all in Pandas. Essentially stacking df with disimilar columns one on top of the other – Peter Lucas May 24 '21 at 08:15

2 Answers2

1

First add soem not exist columns in df_Summay and change order by df_Detail:

df_Summay['Trading_Book'] = 'Grand Totals'

df_Summay = df_Summay.reindex(df_Detail.columns, fill_value='', axis=1)
print(df_Summay)
   Trading_Book Client  Ccy  Total_RFQs  Avg_Competing_Dealers  Notional  \
0  Grand Totals         AUD           7                      8      7000   

   Notional_Done  Hit_Rate  
0         4000.0  0.571429  

And then use this function for append both DataFrames to same excel file:

filename = 'out.xlsx'
append_df_to_excel(filename, df_Summay, sheet_name='Sheet2',  header=None, index=False)
append_df_to_excel(filename, df_Detail, sheet_name='Sheet2', index=False, startrow=1)
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
0

It really is as you have played out.

import pandas as pd

df ={'Trading_Book': ['A', 'A', 'B', 'B','C','C','C',],
'Client': ['AA', 'AA', 'BB', 'BB','CC','CC','CC',],
'Ccy':[ 'AUD', 'AUD', 'AUD', 'AUD','AUD','AUD','AUD',],
'RFQ_State': ['Done', 'Done', 'Done', 'Traded Away','Traded Away', 'Traded Away','Done'],
'Competing_Dealers': [2, 4, 6, 8, 10, 12,14],
'Notional': [1000, 1000, 1000, 1000,1000,1000,1000],
        }


df = pd.DataFrame(df)

# generate sub-totals
dfs = (df.groupby(["Trading_Book","Client","Ccy"])
 .agg(
     Total_RFQs=("RFQ_State","count"),
     Avg_Competing_Dealers=("Competing_Dealers","mean"),
     Notional=("Notional","sum"),
     Notional_Done=("Notional", lambda s: (df.loc[s.index,"RFQ_State"].eq("Done")*s).sum())
 )
 .assign(Hit_Rate=lambda d: d["Notional_Done"]/d["Notional"])
)

# generate grandtotals, use dummy column for groupby...
dft = (df.assign(grandtotal=1).groupby(["grandtotal"], as_index=False)
 .agg(
     Total_RFQs=("RFQ_State","count"),
     Avg_Competing_Dealers=("Competing_Dealers","mean"),
     Notional=("Notional","sum"),
     Notional_Done=("Notional", lambda s: (df.loc[s.index,"RFQ_State"].eq("Done")*s).sum())
 )
 .assign(Hit_Rate=lambda d: d["Notional_Done"]/d["Notional"])
       .drop(columns="grandtotal")
)

print(f"{dft.to_string(index=False)}\n{dfs.to_string()}")

output

 Total_RFQs  Avg_Competing_Dealers  Notional  Notional_Done  Hit_Rate
          7                      8      7000           4000  0.571429
                         Total_RFQs  Avg_Competing_Dealers  Notional  Notional_Done  Hit_Rate
Trading_Book Client Ccy                                                                      
A            AA     AUD           2                      3      2000           2000  1.000000
B            BB     AUD           2                      7      2000           1000  0.500000
C            CC     AUD           3                     12      3000           1000  0.333333

Rob Raymond
  • 29,118
  • 3
  • 14
  • 30
  • Hi @Rob Raymond. Thanks, I am getting the following when running: Notional_Done=("Notional", lambda s: (df.loc[s.index,"RFQ_State"].eq("Done")*s).sum()) TypeError: aggregate() missing 1 required positional argument: 'arg' – Peter Lucas May 24 '21 at 23:04