I have the following DataFrame
If df['Time']
and df['OrderID']
are the same, and df['MessageType']
is 'D'
followed by 'A'
, then remove the row that contains 'D'
and rename the value 'A'
to 'AMEND'
. Here's my code:
import pandas as pd
Instrument = df['Symbol']
Date = df['Date']
Time = df['Time']
RecordType = df['MessageType']
Price = df['Price']
Volume = df['Quantity']
Qualifiers = df['ExchangeOrderType']
OrderID = df['OrderID']
MatchID = df['MatchID']
Side = df['Side']
for i in range(len(Time)-1):
if((Time[i] == Time[i+1]) & (RecordType[i] == "D") & (RecordType[i+1] == "A")):
del Instrument[i]
del Date[i]
del Time[i]
del RecordType[i]
del Price[i]
del Volume[i]
del Qualifiers[i]
del OrderID[i]
del Side[i]
RecordType[i+1] = "AMEND" # rename the message type
# creating a new dataframe with updated lists
new_df = pd.DataFrame({'Instrument':Instrument, 'Date':Date, 'Time':Time, 'RecordType':RecordType, 'Price':Price, 'Volume':Volume, 'Qualifiers':Qualifiers, 'OrderID':OrderID, 'MatchID':MatchID, 'Side':Side}).reset_index(drop=True)
new_df['RecordType']=np.where(new_df['RecordType'] =='O', 'CONTROL', new_df['RecordType'])
new_df['RecordType']=np.where(new_df['RecordType'] =='A', 'ENTER', new_df['RecordType'])
new_df['RecordType']=np.where(new_df['RecordType'] =='D', 'DELETE', new_df['RecordType'])
However, I have many different Symbol
and Date
and wish to use groupby in the for loop. I tried
grouped = df.groupby(['Symbol', 'Date'])
and replaced df with grouped but it didn't work. Also, I realize that my code is index sensitive, i.e., it must start with index zero for the for loop to work. I'm not sure if groupby will cause index problem or not.
Please help.
Thank you.