I am trying to create a calculated column in a pandas data frame that runs a different calculation based on another column in the data frame.
First I tried:
df_rollup['modeled_days'] = abs(round(((df_rollup.risk_avg) - 31) / (master_weight /100) / (prod_tolerance / 100), 0)).where(df_rollup['completion_status'] == 'PRODUCING')
df_rollup['modeled_days'] = abs(round(((df_rollup.risk_avg) - 31) / (master_weight / 100) / (shutin_tolerance / 100), 0)).where(df_rollup['completion_status'] == 'SHUT IN')
df_rollup['modeled_days'] = abs(round(((df_rollup.risk_avg) - 31) / (master_weight / 100) / (abandoned_tolerance / 100), 0)).where(df_rollup['completion_status'].str.contains('ABANDONED'))
I quickly realized that this would overwrite every row with the last updates calculatation and replace the non matching rows to Nan.
So I researched another approach that I believe is on the right track but I receive the error: ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
This is my approach:
def production_type_calc(df_rollup, master_weight, prod_tolerance, shutin_tolerance, abandoned_tolerance):
if df_rollup['completion_status'] == 'PRODUCING':
return abs(round((df_rollup.risk_avg - 31) / (master_weight / 100) / (prod_tolerance / 100), 0))
elif df_rollup['completion_status'] == 'SHUT IN':
return abs(round((df_rollup.risk_avg - 31) / (master_weight / 100) / (shutin_tolerance / 100), 0))
elif df_rollup['completion_status'].str.contains('ABANDONED'):
return abs(round((df_rollup.risk_avg - 31) / (master_weight / 100) / (abandoned_tolerance / 100), 0))
else:
return 0
I ran this function using this the .apply method as such:
df_rollup['modeled_days'] = df_rollup.apply(production_type_calc(df_rollup, master_weight, prod_tolerance, shutin_tolerance, abandoned_tolerance), axis=1)
I have ran into this problem before and it seems like I need to nest the data frame e.g df = df[df[''] or something of the sort but I don't know how to begin. I would appreciate any help on this.