I have a pandas dataframe and I created a mask that filters a series out to a specific operation type. Ultimately, I am going to do some group by aggregation(s) but before I do I need to qc the data so that the groups groups correctly. So the problem I am having is that my operation category has some random null values that I need to change, so what I need to do is check the previous and next operation and then if both are the same then I will need to change the current node to that operation type.
Note: this is a time series dataframe
I am trying to optimize this method for time as the ultimate dataset will likely be large. I have tried apply and lambda functions but they are not working correctly....probably user error.
def prep_rot_data(ops_rot_cur, ops_rot_lag, ops_rot_lead):
# psuedo code
'''
if ops_rot_cur == rotate then return 'rotate'
elif ops_rot_cur == None and ops_rot_lag == 'rotate' and ops_rot_lead == 'rotate' then return 'rotate'
'''
if 'rotate' in ops_rot_cur :
return 'rotate'
elif 'na' in ops_rot_cur and 'rotate' in ops_rot_lag and 'rotate' in ops_rot_lead:
return 'rotate'
c_df['rig_ops_rot'] = np.where(c_df['operation'] == 'Rotary Drill','rotate', None) # rotate ops mask
#c_df['rig_ops_rot_test'] = c_df.apply(lambda x: prep_rot_data(x['rig_ops_rot'], x['rig_ops_rot'].shift(1), x['rig_ops_rot'].shift(-1)), axis=1)
I am not sure what the most effective means of calculating a multi-condition check-in pandas is, so there may be a more effective way to perform this operation. I expect the operation to check both the lag node and the lead node before interpreting the correct result