0

I have a dataframe that looks like this, which store the order by order information of an order book. Type N = new order, D = delete order and E = execution. the same order_id might be reused.

So basically the problem is that delete and execution does not have a proper price as they should be inferred by the last new order with the same oid. Could someone suggest a method to achieve this? Thank you

Input

       type        order_id      price
0         N              10         99
1         E              10          0
1         E              10          0
1         D              10          0
0         N              11         98
1         N              10         97
1         D              10          0

Output

       type        order_id      price
0         N              10         99
1         E              10         **99**
1         E              10         **99**
1         D              10         **99**
0         N              11         98
1         N              10         97
1         D              10         **97**
  • 1
    To achieve what? Your question is not clear. Through what code is your input going? Is the output provided what you have or what you want? – pawamoy Apr 03 '18 at 15:07
  • @jpp - I think it is uncorrect closed question, so reopened. – jezrael Apr 03 '18 at 15:16

2 Answers2

1

Seems like you need replace + ffill, since here I assuming you have the correct order of your df.

df.replace(0,np.nan).ffill()
Out[758]: 
  type  order_id  price
0    N        10   99.0
1    E        10   99.0
1    E        10   99.0
1    D        10   99.0
0    N        11   98.0
1    N        10   97.0
1    D        10   97.0

Or we adding groupby

df.assign(price=df.price.replace(0,np.nan)).groupby(df.type.eq('N').cumsum()).price.ffill().values
BENY
  • 317,841
  • 20
  • 164
  • 234
1

I think need:

df['price'] = df['price'].mask(df['type'].isin(['E','D']))
#df['price'] = df['price'].where(df['type'] == 'N')

df['price'] = df.groupby(df['order_id'].ne(df['order_id'].shift()).cumsum())['price'].ffill()

print (df)
  type  order_id  price
0    N        10   99.0
1    E        10   99.0
1    E        10   99.0
1    D        10   99.0
0    N        11   98.0
1    N        10   97.0
1    D        10   97.0

Explanation:

  1. First replace values of price to NaNs with mask or inverted condition with where
  2. Then groupby by helper Series created by consecutive order_id with forward fill NaNs by DataFrameGroupBy.ffill:
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252