1

I am trying to create a calculated column in a pandas data frame that runs a different calculation based on another column in the data frame.

First I tried:

df_rollup['modeled_days'] = abs(round(((df_rollup.risk_avg) - 31) / (master_weight /100) / (prod_tolerance / 100), 0)).where(df_rollup['completion_status'] == 'PRODUCING')
df_rollup['modeled_days'] = abs(round(((df_rollup.risk_avg) - 31) / (master_weight / 100) / (shutin_tolerance / 100), 0)).where(df_rollup['completion_status'] == 'SHUT IN')
df_rollup['modeled_days'] = abs(round(((df_rollup.risk_avg) - 31) / (master_weight / 100) / (abandoned_tolerance / 100), 0)).where(df_rollup['completion_status'].str.contains('ABANDONED'))

I quickly realized that this would overwrite every row with the last updates calculatation and replace the non matching rows to Nan.

So I researched another approach that I believe is on the right track but I receive the error: ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

This is my approach:

def production_type_calc(df_rollup, master_weight, prod_tolerance, shutin_tolerance, abandoned_tolerance):

if df_rollup['completion_status'] == 'PRODUCING':
    return abs(round((df_rollup.risk_avg - 31) / (master_weight / 100) / (prod_tolerance / 100), 0))
elif df_rollup['completion_status'] == 'SHUT IN':
    return abs(round((df_rollup.risk_avg - 31) / (master_weight / 100) / (shutin_tolerance / 100), 0))
elif df_rollup['completion_status'].str.contains('ABANDONED'):
    return abs(round((df_rollup.risk_avg - 31) / (master_weight / 100) / (abandoned_tolerance / 100), 0))
else:
    return 0

I ran this function using this the .apply method as such:

df_rollup['modeled_days'] = df_rollup.apply(production_type_calc(df_rollup, master_weight, prod_tolerance, shutin_tolerance, abandoned_tolerance), axis=1)

I have ran into this problem before and it seems like I need to nest the data frame e.g df = df[df[''] or something of the sort but I don't know how to begin. I would appreciate any help on this.

Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
Casey
  • 25
  • 2
  • you need `np.select` see this answer by Ubuntu https://stackoverflow.com/a/19913845/9375102 – Umar.H Jun 20 '20 at 00:35

1 Answers1

1

I still like your first method , and we can do the np.select

con1=df_rollup['completion_status'] == 'PRODUCING'
con2=df_rollup['completion_status'] == 'SHUT IN'
con3=df_rollup['completion_status'].str.contains('ABANDONED')
v1=abs(round(((df_rollup.risk_avg) - 31) / (master_weight /100) / (prod_tolerance / 100), 0))
v2=abs(round(((df_rollup.risk_avg) - 31) / (master_weight / 100) / (shutin_tolerance / 100), 0))
v3=abs(round(((df_rollup.risk_avg) - 31) / (master_weight / 100) / (abandoned_tolerance / 100), 0))
df_rollup['modeled_days']=np.select([con1, con2, con3], [v1,v2,v3])
BENY
  • 317,841
  • 20
  • 164
  • 234