Conditional groupby and shift in pandas

Question

I have a dataframe that looks like this, which store the order by order information of an order book. Type N = new order, D = delete order and E = execution. the same order_id might be reused.

So basically the problem is that delete and execution does not have a proper price as they should be inferred by the last new order with the same oid. Could someone suggest a method to achieve this? Thank you

Input

       type        order_id      price
0         N              10         99
1         E              10          0
1         E              10          0
1         D              10          0
0         N              11         98
1         N              10         97
1         D              10          0

Output

       type        order_id      price
0         N              10         99
1         E              10         **99**
1         E              10         **99**
1         D              10         **99**
0         N              11         98
1         N              10         97
1         D              10         **97**

To achieve what? Your question is not clear. Through what code is your input going? Is the output provided what you have or what you want? — pawamoy, Apr 03 '18 at 15:07
@jpp - I think it is uncorrect closed question, so reopened. — jezrael, Apr 03 '18 at 15:16

BENY · Answer 1 · 2018-04-03T15:20:01.797

Seems like you need replace + ffill, since here I assuming you have the correct order of your df.

df.replace(0,np.nan).ffill()
Out[758]: 
  type  order_id  price
0    N        10   99.0
1    E        10   99.0
1    E        10   99.0
1    D        10   99.0
0    N        11   98.0
1    N        10   97.0
1    D        10   97.0

Or we adding groupby

df.assign(price=df.price.replace(0,np.nan)).groupby(df.type.eq('N').cumsum()).price.ffill().values

jezrael · Answer 2 · 2018-04-03T15:31:42.570

I think need:

df['price'] = df['price'].mask(df['type'].isin(['E','D']))
#df['price'] = df['price'].where(df['type'] == 'N')

df['price'] = df.groupby(df['order_id'].ne(df['order_id'].shift()).cumsum())['price'].ffill()

print (df)
  type  order_id  price
0    N        10   99.0
1    E        10   99.0
1    E        10   99.0
1    D        10   99.0
0    N        11   98.0
1    N        10   97.0
1    D        10   97.0

Explanation:

First replace values of price to NaNs with mask or inverted condition with where
Then groupby by helper Series created by consecutive order_id with forward fill NaNs by DataFrameGroupBy.ffill:

Conditional groupby and shift in pandas

2 Answers2