0

I have a pandas dataframe and I created a mask that filters a series out to a specific operation type. Ultimately, I am going to do some group by aggregation(s) but before I do I need to qc the data so that the groups groups correctly. So the problem I am having is that my operation category has some random null values that I need to change, so what I need to do is check the previous and next operation and then if both are the same then I will need to change the current node to that operation type.

Note: this is a time series dataframe

I am trying to optimize this method for time as the ultimate dataset will likely be large. I have tried apply and lambda functions but they are not working correctly....probably user error.

def prep_rot_data(ops_rot_cur, ops_rot_lag, ops_rot_lead):
    # psuedo code
    '''
        if ops_rot_cur == rotate then return 'rotate'
        elif ops_rot_cur == None and ops_rot_lag == 'rotate' and ops_rot_lead == 'rotate' then return 'rotate'
    '''
    if 'rotate' in ops_rot_cur :
        return 'rotate'
    elif 'na' in ops_rot_cur and 'rotate' in ops_rot_lag and 'rotate' in ops_rot_lead:
        return 'rotate'

  c_df['rig_ops_rot'] = np.where(c_df['operation'] == 'Rotary Drill','rotate', None)              # rotate ops mask
            #c_df['rig_ops_rot_test'] = c_df.apply(lambda x: prep_rot_data(x['rig_ops_rot'], x['rig_ops_rot'].shift(1), x['rig_ops_rot'].shift(-1)), axis=1)

I am not sure what the most effective means of calculating a multi-condition check-in pandas is, so there may be a more effective way to perform this operation. I expect the operation to check both the lag node and the lead node before interpreting the correct result

help-ukraine-now
  • 3,850
  • 4
  • 19
  • 36
Tyler Hunt
  • 25
  • 6
  • 1
    Please have a look at [How to make good, reproducible pandas examples](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) and provide some sample input data, and your preferred output based on that input so we can better understand your problem – G. Anderson Aug 06 '19 at 19:34

1 Answers1

0

The output you wanted is not clear enough, but multiple conditions can be handled with pandas.DataFrame objects, as your pseudo code suggest

'''
    if ops_rot_cur == rotate then return 'rotate'
    elif ops_rot_cur == None and ops_rot_lag == 'rotate' and ops_rot_lead == 'rotate' then return 'rotate'
'''

you may try the following operations assume your df my_df contains the columns ops_rot_cur, ops_rot_lag and ops_rot_lead;

my_df['rig_ops_rot'] = ''
cond1 = my_df['ops_rot_cur'].str.contains('rotate')
cond2 = (my_df['ops_rot_cur'].isna())&(my_df['ops_rot_lag'].str.contains('rotate'))&\
        (my_df['ops_rot_lead'].str.contains('rotate'))
my_df[cond1] = 'rotate'
my_df[cond2] = 'rotate'

,or alternatively

def get_rig_ops_rot(row):
    if 'rotate' in row['ops_rot_cur']:
        row['rig_ops_rot'] = 'rotate'
    elif 'na' in row['ops_rot_cur'] and 'rotate' in row['ops_rot_lag'] and 'rotate' in row['ops_rot_lead']:
        row['rig_ops_rot'] = 'rotate'
    return row

my_df = my_df.apply(get_rig_ops_rot, axis=1)
null
  • 1,944
  • 1
  • 14
  • 24