1

Consider a pandas dataframe like:

>> df

date_time            op_type  price  volume
01-01-1970 9:30:01     ASK     100    1800 
01-01-1970 9:30:25     ASK      90    1000      
01-01-1970 9:30:28     BID      90     900
01-01-1970 9:30:28    TRADE     90     900
01-01-1970 9:31:01     BID      80     500
01-01-1970 9:31:09     ASK      80     100
01-01-1970 9:31:09    TRADE     80     100

I would like to do three calculations: i) the cumulative sum of the volume for op_type == "ASK" rows; ii) the cumulative sum of the volume for op_type == "BID" rows; and iii) the sum of the previous two volumes.

That is simple enough, but there is a condition for op_type == "TRADE" operations:

  1. Whenever there is a TRADE operation whose price matches the price on a BID operation, I would like to subtract that TRADE operation's volume from the cumulative BID volume.

  2. Whenever there is a TRADE operation whose price matches the price on an ASK operation, I would like to subtract that TRADE operation's volume from the cumulative ASK volume.

The output I'm looking for is:

>> df

date_time            op_type  price  volume  ASK_vol  BID_vol  BIDASK_vol
01-01-1970 9:30:01     ASK     100    1800    1800       0        1800
01-01-1970 9:30:25     ASK      90    1000    2800       0        2800
01-01-1970 9:30:28     BID      90     900    2800      900       3700
01-01-1970 9:30:28    TRADE     90     900    2800       0        2800
01-01-1970 9:31:01     BID      80     500    2800      500       3300
01-01-1970 9:31:09     ASK      80     100    2900      500       3400
01-01-1970 9:31:09    TRADE     80     100    2800      500       3300

I read this question but I'm not sure how to incorporate the conditional subtraction to that answer. I would greatly appreciate any help, thank you.

scrps93
  • 303
  • 4
  • 12

1 Answers1

1

IIUC, this is what you need.

a= np.where(df['op_type'] == 'ASK',df.volume,0)
b= np.where(df['op_type'] == 'BID',df.volume,0)
a_t = (np.where(df['op_type'] == 'TRADE',
          (np.where(df['op_type'].shift(1) == 'ASK',
                    (np.where(df['volume']==df['volume'].shift(1),-df.volume,0)),0)),0))
b_t = (np.where(df['op_type'] == 'TRADE',
          (np.where(df['op_type'].shift(1) == 'BID',
                    (np.where(df['volume']==df['volume'].shift(1),-df.volume,0)),0)),0))
df['ASK_vol']=(np.where(a_t!=0,a_t,a)).cumsum()
df['BID_vol']=(np.where(b_t!=0,b_t,b)).cumsum()
df['BIDASK_vol']= df['ASK_vol']+df['BID_vol']

output

           date_time    op_type     price   volume  ASK_vol BID_vol BIDASK_vol
01-01-1970  9:30:01     ASK         100     1800    1800    0       1800
01-01-1970  9:30:25     ASK         90      1000    2800    0       2800
01-01-1970  9:30:28     BID         90      900     2800    900     3700
01-01-1970  9:30:28     TRADE       90      900     2800    0       2800
01-01-1970  9:31:01     BID         80      500     2800    500     3300
01-01-1970  9:31:09     ASK         80      100     2900    500     3400
01-01-1970  9:31:09     TRADE       80      100     2800    500     3300

moys
  • 7,747
  • 2
  • 11
  • 42
  • 1
    Just a small observation, in lines 5 and 8 of your answer the condition `df['volume']==df['volume'].shift(1)` within the last `np.where()` should be `df['price']==df['price'].shift(1)` – scrps93 Sep 11 '19 at 18:25
  • 1
    oh! ok. I thought we were looking for matching volumes. But in-fact we are looking for matching prices. Good to know that you figured that out. – moys Sep 12 '19 at 01:53