0

Hi I have a complicated looping problem in that I have to loop through a dataframe of trading data (one trade per row). The trade data has trades against external counterparties and also internal "pending" counterparties and I need to:

(a) find three related trades and

(b) change the execution time on the first of three trades to match that of the third trade.

The reason this is three trades is that each trade has a execution time and an entry time (entry time is when the trade flows down to further systems and therefore is equal to or later than execution time). This must sound overly complicated but that is the way the system works so is a given. There is more detail in the code sample below. This works but slowly (hundreds of thousands of trades in the file). My python is rudimentary so I am assuming that there must be a much more efficient way of doing this possibly with .apply or something else? Anyone have any suggestions?

After comments below I have cleaned this up (apologies) and created a minimal working example (should have done that originally)

import pandas as pd
import numpy as np
# TradeId - unique trade id
# ExecutionTime - time trade was executed
# EntryTime - time trade entered processing system (equal to or after Execution time)
# Counterparty - counterparty name including external counterparties and internal pending
# TraderName - eg Bob Smith 
# CcyPair - eg GBPUSD
# BaseTrade - notional of the trade in base currency eg 100 GBP
allTradesArrays = {
'TradeId':[101,102,103,104,105,106,107,108,109,110,111,112,113,114,115,116,117,118,119,120],
'ExecutionDateTime':['06/26/2018 12:49:23','06/26/2018 13:20:12','06/26/2018 13:20:12','06/26/2018 13:20:32','06/26/2018 13:22:19','06/26/2018 13:25:13','06/26/2018 13:26:18','06/26/2018 15:50:42','06/26/2018 15:51:12','06/26/2018 15:51:12','06/26/2018 15:54:10','06/26/2018 16:17:09','06/26/2018 18:54:06','06/26/2018 18:54:12','06/26/2018 18:54:12','06/26/2018 18:54:15','06/26/2018 19:42:05','06/26/2018 19:58:25','06/26/2018 20:13:19','06/26/2018 20:13:19'],
'EntryDateTime':['06/26/2018 12:49:23','06/26/2018 13:25:13','06/26/2018 13:25:13','06/26/2018 13:20:33','06/26/2018 13:22:19','06/26/2018 13:25:13','06/26/2018 13:26:18','06/26/2018 15:50:42','06/26/2018 15:52:01','06/26/2018 15:54:10','06/26/2018 15:54:10','06/26/2018 16:17:11','06/26/2018 18:54:07','06/26/2018 18:54:30','06/26/2018 19:58:25','06/26/2018 18:54:16','06/26/2018 19:42:05','06/26/2018 19:58:25','06/26/2018 20:13:19','06/26/2018 20:13:19'],
'Counterparty':['cpty1','PENDING','cpty2','cpty12','cpty3','PENDING','cpty6','cpty2','PENDING','cpty8','PENDING','cpty9','cpty1','PENDING','cpty8','cpty3','cpty5','PENDING','cpty6','cpty2'],
'CcyPair':['GBPUSD','GBPUSD','GBPUSD','GBPUSD','GBPUSD','GBPUSD','GBPUSD','GBPUSD','GBPUSD','GBPUSD','GBPUSD','GBPUSD','GBPUSD','GBPUSD','GBPUSD','GBPUSD','GBPUSD','GBPUSD','GBPUSD','GBPUSD'],
'BaseTrade':[5,-6.94037287,-6.94037287,-11.63,-0.77222412,6.94037287,21.88,-5.2,10,10,-10,0.3931098,2.5,-670,-670,2.029,20,670,7.37097,11.056455]
}
allTrades = pd.DataFrame(allTradesArrays)
print()
print('allTrades.info()')
print(allTrades.info())
print()
print('allTrades.head()')
print(allTrades.head())
# Create a dataframe that is a subset of the trades dataframe that 
# contains only trades against the pending counterparty - note the
# pending trades are still in the allTrades df as I need to keep
# allTrades complete for further processing once the time stamp changes 
# are made by this looping
pendingTrades = allTrades[allTrades['Counterparty'] == 'PENDING']
print()
print('pendingTrades.info()')
print(pendingTrades.info())
print()
print('pendingTrades.head()')
print(pendingTrades.head())
# iterate over each trade in pendingTrades as explained below
for pendingTradeIndex1, pendingTrade1 in pendingTrades.iterrows():
    for allTradeIndex, allTrade in allTrades.iterrows():
        if (
            # if we find a trade in allTrades that is: 
            # not the same pendingTrade1 trade that is also in the allTrades 
            pendingTrade1['TradeId'] != allTrade['TradeId']
            # has the same CcyPair
            and pendingTrade1['CcyPair'] == allTrade['CcyPair']
            # has the same notional and sign
            and pendingTrade1['BaseTrade'] == allTrade['BaseTrade'] 
            # has matching execution datetimes
            and pendingTrade1['ExecutionDateTime'] == allTrade['ExecutionDateTime']
            # then we have found the first two of three trades
        ):
            # To find the third trade apply similar logic
            for pendingTradeIndex2, pendingTrade2 in pendingTrades.iterrows():
                if (
                    # If we find the second trade in allTrades that is:
                    # not the same pendingTrade2 trade that is also in the allTrades 
                    pendingTrade2['TradeId'] != allTrade['TradeId']
                    # is the same CcyPair as the devon trade
                    and pendingTrade2['CcyPair'] == allTrade['CcyPair']
                    # has the same notional but opposite sign as the devon trade
                    and pendingTrade2['BaseTrade'] == -1*allTrade['BaseTrade'] 
                    # has matching entry datetimes as the devon trade
                    and pendingTrade2['EntryDateTime'] == allTrade['EntryDateTime']
                    # does not have matching execution datetimes as the devon trade
                    and pendingTrade2['ExecutionDateTime'] != allTrade['ExecutionDateTime']
                    # then we should have found the third of three trades
                    # note this third trade is always a pending counterparty trade
                ):
                    print('pendingTrade1 ' + str(pendingTrade1['TradeId']) + 
                          ' ExTime=' + 
                          str(allTrades.loc[allTrades['TradeId']==pendingTrade1['TradeId'],'ExecutionDateTime'].values[0]) +
                          ' EnTime=' + 
                          str(allTrades.loc[allTrades['TradeId']==pendingTrade1['TradeId'],'EntryDateTime'].values[0]) +
                          ' BaseTr=' + 
                          str(allTrades.loc[allTrades['TradeId']==pendingTrade1['TradeId'],'BaseTrade'].values[0]))
                    print('allTrade      ' + str(allTrade['TradeId']) + 
                          ' ExTime=' + 
                          str(allTrades.loc[allTrades['TradeId']==allTrade['TradeId'],'ExecutionDateTime'].values[0]) +
                          ' EnTime=' + 
                          str(allTrades.loc[allTrades['TradeId']==allTrade['TradeId'],'EntryDateTime'].values[0]) +
                          ' BaseTr=' + 
                          str(allTrades.loc[allTrades['TradeId']==allTrade['TradeId'],'BaseTrade'].values[0]))
                    print('pendingTrade2 ' + str(pendingTrade2['TradeId']) + 
                          ' ExTime=' + 
                          str(allTrades.loc[allTrades['TradeId']==pendingTrade2['TradeId'],'ExecutionDateTime'].values[0]) +
                          ' EnTime=' + 
                          str(allTrades.loc[allTrades['TradeId']==pendingTrade2['TradeId'],'EntryDateTime'].values[0]) +
                          ' BaseTr=' + 
                          str(allTrades.loc[allTrades['TradeId']==pendingTrade2['TradeId'],'BaseTrade'].values[0]))
                    print('Changing ' + str(pendingTrade1['TradeId']) + ' ExTime=' + 
                          str(allTrades.loc[allTrades['TradeId']==pendingTrade1['TradeId'],'ExecutionDateTime'].values[0]) +
                          ' to ' + str(pendingTrade2['TradeId']) + ' ExTime=' + 
                          str(allTrades.loc[allTrades['TradeId']==pendingTrade2['TradeId'],'ExecutionDateTime'].values[0]))
                    allTrades.loc[allTrades['TradeId'] == pendingTrade1['TradeId'],'ExecutionDateTime'] = \
                    allTrades.loc[allTrades['TradeId'] == pendingTrade2['TradeId'],'ExecutionDateTime'].values[0]
                    print()
CT00
  • 77
  • 1
  • 2
  • 9
  • 2
    Iterating over pandas df is never a good idea. One thing you could do is apply nesting conditions or performing some manipulation try to use lambda, apply and make use of numpy for calculations. Would appreciate a specific question here. – mad_ Aug 30 '18 at 15:26
  • 3
    Can you produce a minimal reproducible example of your problem? It's unclear what `counterpartyFilter` is, among many other things – rahlf23 Aug 30 '18 at 15:26
  • I've cleaned up above and changed the code to a working self contained example thanks for looking at this – CT00 Aug 30 '18 at 16:26

1 Answers1

0

Give itertuples() a try. It is supposed to be much quicker than iterrows()

Stack Overflow: Does iterrows have performance issues

Sean
  • 581
  • 5
  • 20