Morning All,
Initial data:
df ={'Trading_Book': ['A', 'A', 'B', 'B','C','C','C',],
'Client': ['AA', 'AA', 'BB', 'BB','CC','CC','CC',],
'Ccy':[ 'AUD', 'AUD', 'AUD', 'AUD','AUD','AUD','AUD',],
'RFQ_State': ['Done', 'Done', 'Done', 'Traded Away','Traded Away', 'Traded Away','Done'],
'Competing_Dealers': [2, 4, 6, 8, 10, 12,14],
'Notional': [1000, 1000, 1000, 1000,1000,1000,1000],
}
print(df)
Trading_Book Client Ccy RFQ_State Competing_Dealers Notional
A AA AUD Done 2 1000
A AA AUD Done 4 1000
B BB AUD Done 6 1000
B BB AUD Traded Away 8 1000
C CC AUD Traded Away 10 1000
C CC AUD Traded Away 12 1000
C CC AUD Done 14 1000
My end state is an Excel report with Totals at the top and grouped By columns
Grand Totals 7 8 7,000 4,000 0.571429
Trading_Book Client Ccy Total_RFQs Avg_Competing_Dealers Notional Notional_Done Hit_Rate
A AA AUD 2 3 2,000 2,000 1
B BB AUD 2 7 2,000 1,000 0.5
C CC AUD 3 12 3,000 1,000 0.3333
My attempt:
a) Add Total Columns i.e. Total RFQ, Total Notional Done (if state
='Done' populate) and Hit Rates (Notional_Done / Notional) per Trading_Book
, Client
, Ccy
grouping:
# Add the Notional where state = done
mask = df['RFQ_State'].str.contains('Done')
df.loc[mask, 'Notional_Done'] = df['Notional']
# Populate any nulls due to 0/0
df = df.fillna(0)
# Column for Done RFQ's used in Hit Rate
df['Done_RFQ'] = np.where(df['RFQ_State'].str.contains('Done'), 1, 0)
# Hit_Rate = Done Notional / Total Notional
df['Hit_Rate'] = (df['Notional_Done'] / df['Notional'])
df.to_csv('C:\EOM_ML_201901\outputs\Scratchpad2' + '.csv')
b) Group By
df_Detail = df.groupby(['Trading_Book', 'Client', 'Ccy']).agg({'RFQ_State': 'count',
'Competing_Dealers': 'mean',
'Notional': 'sum',
'Notional_Done': 'sum',
}).reset_index()
df_Detail = df_Detail.sort_values(['Trading_Book', 'Client'], ascending=[True, True])
df_Detail.rename(columns={
'RFQ_State': 'Total_RFQs',
'Competing_Dealers': 'Avg_Competing_Dealers'
}, inplace=True)
# Hit_Rate = Done Notional / Total Notional
df_Detail['Hit_Rate'] = (df_Detail['Notional_Done'] / df_Detail['Notional'])
print(df_Detail)
df_Detail.to_csv('C:\EOM_ML_201901\outputs\Scratchpad2' + '.csv')
Trading_Book Client Ccy Total_RFQs Avg_Competing_Dealers Notional Notional_Done
A AA AUD 2 3 2000 2000
B BB AUD 2 7 2000 1000
C CC AUD 3 12 3000 0.3333333
c) Find the Grand Totals for each of these grouped by columns based on the original data:
df_Summay = df.groupby(['Ccy']).agg({'RFQ_State': 'count',
'Competing_Dealers': 'mean',
'Notional': 'sum',
'Notional_Done': 'sum',
}).reset_index()
df_Summay.rename(columns={
'RFQ_State': 'Total_RFQs',
'Competing_Dealers': 'Avg_Competing_Dealers'
}, inplace=True)
# Hit_Rate = Done Notional / Total Notional
df_Summay['Hit_Rate'] = (df_Summay['Notional_Done'] / df_Summay['Notional'])
df_Summay.to_csv('C:\EOM_ML_201901\outputs\Scratchpad3' + '.csv')
print(df_Summay)
Ccy Total_RFQs Avg_Competing_Dealers Notional Notional_Done Hit_Rate
AUD 7 8 7000 4000 0.571429
Issues:
- There may be many currencies in the underlying data so need to solve how to return only the Grand Totals from the original data without the grouping (AUD).
- Place these totals above the df_Detail and export. This is like an append of the two dataframes i.e.
df_Summary
(less the headers and AUD value) stacked on top ofdf_Detail