Using Groupby within a for loop

Question

I have the following DataFrame

If df['Time'] and df['OrderID'] are the same, and df['MessageType'] is 'D' followed by 'A', then remove the row that contains 'D' and rename the value 'A' to 'AMEND'. Here's my code:

import pandas as pd

Instrument = df['Symbol']
Date = df['Date']
Time = df['Time']
RecordType = df['MessageType']
Price = df['Price']
Volume = df['Quantity']
Qualifiers = df['ExchangeOrderType']
OrderID = df['OrderID']
MatchID = df['MatchID']
Side = df['Side']

for i in range(len(Time)-1):
    if((Time[i] == Time[i+1]) & (RecordType[i] == "D") & (RecordType[i+1] == "A")):
        del Instrument[i]
        del Date[i]
        del Time[i]
        del RecordType[i]
        del Price[i]
        del Volume[i]
        del Qualifiers[i]
        del OrderID[i]
        del Side[i]
        RecordType[i+1] = "AMEND" # rename the message type

# creating a new dataframe with updated lists
new_df = pd.DataFrame({'Instrument':Instrument, 'Date':Date, 'Time':Time, 'RecordType':RecordType, 'Price':Price, 'Volume':Volume, 'Qualifiers':Qualifiers, 'OrderID':OrderID, 'MatchID':MatchID, 'Side':Side}).reset_index(drop=True)

new_df['RecordType']=np.where(new_df['RecordType'] =='O', 'CONTROL', new_df['RecordType'])
new_df['RecordType']=np.where(new_df['RecordType'] =='A', 'ENTER', new_df['RecordType'])
new_df['RecordType']=np.where(new_df['RecordType'] =='D', 'DELETE', new_df['RecordType'])

However, I have many different Symbol and Date and wish to use groupby in the for loop. I tried grouped = df.groupby(['Symbol', 'Date']) and replaced df with grouped but it didn't work. Also, I realize that my code is index sensitive, i.e., it must start with index zero for the for loop to work. I'm not sure if groupby will cause index problem or not.

Please help.

Thank you.

David Erickson · Accepted Answer · 2020-06-08T03:36:05.863

A good solution is to use np.where() for the conditions you have mentioned and .shift(-1) to compare to the next row. You can add more conditions (e.g. a condition for the df['Symbol'] column).

import pandas as pd, numpy as np
    import pandas as pd, numpy as np
df = pd.DataFrame({'Symbol': ['A2M', 'A2M', 'A2M'],
                   'Time' : ['14:00:02 678544300', '07:00:02 678544300', '07:00:02 678544300'],
                  'MessageType' : ['D', 'D', 'A'],
                  'OrderID' : ['72222771064878939976', '72222771064878939976', '72222771064878939976'],
                  'Date' : ['2020-01-02', '2020-01-02', '2020-01-02']})
df['MessageType'] = np.where((df['MessageType'] == 'D') & (df['MessageType'].shift(-1) == 'A') &
                             (df['Date'] == df['Date'].shift(-1)) & (df['Time'] == df['Time'].shift(-1)) &
                             (df['Symbol'] == df['Symbol'].shift(-1)) &
                            (df['OrderID'] == df['OrderID'].shift(-1)), 'AMEND', df['MessageType'])
df

Output:

    Symbol  Time                MessageType  OrderID                    Date
0   A2M     14:00:02 678544300  D            72222771064878939976   2020-01-02
1   A2M     07:00:02 678544300  AMEND        72222771064878939976   2020-01-02
2   A2M     07:00:02 678544300  A            72222771064878939976   2020-01-02

For all your future posts, please consider this post: How to make good reproducible pandas examples You should not include an image. As you can see, I was forced to create a sample dataframe. You can simply copy and paste the data into your answer (and should do that), and then format it or you can do df.to_dict() and copy ans paste that into your SatackOverFlow question. See the link.

Thank you. I will learn to write better next time. Looking at your solution, how can I use groupby in this case (because I have over a hundred of Symbol and Date) and rename that particular row to 'AMEND' after removing 'remove' row? — Shaun Lim, Jun 08 '20 at 00:48
I just realised that groupby is not needed with your codes because you're comparing Date as well. I just have to add (df['Symbol'] == df['Symbol'].shift(-1). However, I need that D and A rows to be converted into AMEND message type, not just filter out 'remove'. What's the best way to do that? — Shaun Lim, Jun 08 '20 at 01:03
df['MessageType'] = np.where((df['MessageType'] == 'A') & (df['MessageType'].shift(+1) == 'remove') & (df['Symbol'] == df['Symbol'].shift(+1)) & (df['Date'] == df['Date'].shift(+1)) & (df['Time'] == df['Time'].shift(+1)) & (df['OrderID'] == df['OrderID'].shift(+1)), 'AMEND', df['MessageType']) — Shaun Lim, Jun 08 '20 at 01:12
@ShaunLim not quite! You need to use `shift(-1)` NOT `shift(+1)` (at least the way I'm doing it). However, I have updated the answer, which should give you the output! Please accept as answer and upvote if helpful. — David Erickson, Jun 08 '20 at 03:37

Using Groupby within a for loop

1 Answers1