0

I am trying to create a function with multiple conditions. This would add new column with a calculated amount. Part of it is below.

I have looked at other answers but can't work out what is wrong.

# import modules
import pandas as pd
import numpy as np

#data
data2 = pd.DataFrame({'Unit of Measure' : ['EA', 'EA', 'AA'],
                     'round' : ['no', 'yes', 'no'],
                     'add_gst' : ['no', 'yes', 'no'],
                     'Unit Charge' : [1.8, 2.5, 3.0],
                     'Time Units' : [1.0, 1.5, 2.0]})

# function
def include_gst(df):
    rounded = 0.0
    if(data2['Unit of Measure'] == 'EA' and data2['round'] == 'no' and data2['add_gst'] == 'yes'):
        rounded = data['Unit Charge'] * data['Time Units']
        rounded = rounded.round(2) # to two decimal places
        rounded = rounded * 1.10 # add gst
        rounded = rounded / 0.05 # round to nearest five cents - step 1 - divide by 0.05
        rounded = rounded.round() # round to nearest five cents - step 2 - to nearest integer
        rounded = rounded * 0.05 # round to nearest five cents - step 3 - multiply by 0.05
        return  rounded
    else:
        return 1.0

data2['incl_gst'] = data2.apply(include_gst, axis = 1)

Error:

ValueError: ('The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().', 'occurred at index 0')

It would be good to get it to work. Thanks.

william3031
  • 1,653
  • 1
  • 18
  • 39
  • Does this answer your question? [Truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all()](https://stackoverflow.com/questions/36921951/truth-value-of-a-series-is-ambiguous-use-a-empty-a-bool-a-item-a-any-o) – AMC Dec 11 '19 at 06:33
  • How much effort did you put into solving this? It’s an extremely common problem, there should be tons of resources available. On top of reading the Pandas docs, I would recommend this article: https://ericlippert.com/2014/03/05/how-to-debug-small-programs/. – AMC Dec 11 '19 at 06:35
  • Thanks. I did look up the error here (the article you suggested) before I posted. I couldn't work it out. If you look at the response to the answer below, I worked out that I had `data` instead of `df` and there is no way I would have worked that out from just reading the article. I worked on it for about 20 mins before posting, cutting out parts of the function and trying the other if else methods. I don't usually program in Python and I was stuck. Good enough for you? – william3031 Dec 11 '19 at 11:17

1 Answers1

1

Here under the hood loops by apply are not necessary, you canvectorized solution - added parantheses because priority operator with & for bitwise AND to mask and new column is created by numpy.where:

#changed data for match first row
data2 = pd.DataFrame({'Unit of Measure' : ['EA', 'EA', 'AA'],
                     'round' : ['no', 'yes', 'no'],
                     'add_gst' : ['yes', 'yes', 'no'],
                     'Unit Charge' : [1.8, 2.5, 3.0],
                     'Time Units' : [1.0, 1.5, 2.0]})
print (data2)
  Unit of Measure round add_gst  Unit Charge  Time Units
0              EA    no     yes          1.8         1.0
1              EA   yes     yes          2.5         1.5
2              AA    no      no          3.0         2.0
  Unit of Measure round add_gst  Unit Charge  Time Units  

m = (data2['Unit of Measure'] == 'EA') & (data2['round'] == 'no') & (data2['add_gst'] == 'yes')
rounded = data2['Unit Charge'] * data2['Time Units']
rounded = rounded.round(2) # to two decimal places
rounded = rounded * 1.10 # add gst
rounded = rounded / 0.05 # round to nearest five cents - step 1 - divide by 0.05
rounded = rounded.round() # round to nearest five cents - step 2 - to nearest integer
rounded = rounded * 0.05 # round to nearest five cents - step 3 - multiply by 0.05


data2['incl_gst'] = np.where(m, rounded, 1.0)
print (data2)
  Unit of Measure round add_gst  Unit Charge  Time Units  incl_gst
0              EA    no     yes          1.8         1.0       2.0
1              EA   yes     yes          2.5         1.5       1.0
2              AA    no      no          3.0         2.0       1.0

EDIT:

Because 12 times elif statement solution should be cahnged:

def include_gst(df):
    rounded = 0.0
    if (df['Unit of Measure'] == 'EA') and (df['round'] == 'no') and (df['add_gst'] == 'yes'):
        rounded = df['Unit Charge'] * df['Time Units']
        rounded = round(rounded, 2) # to two decimal places
        rounded = rounded * 1.10 # add gst
        rounded = rounded / 0.05 # round to nearest five cents - step 1 - divide by 0.05
        rounded = round(rounded) # round to nearest five cents - step 2 - to nearest integer
        rounded = rounded * 0.05 # round to nearest five cents - step 3 - multiply by 0.05
        return  rounded
    else:
        return 1.0

data2['incl_gst'] = data2.apply(include_gst, axis = 1)
print (data2)
  Unit of Measure round add_gst  Unit Charge  Time Units  incl_gst
0              EA    no     yes          1.8         1.0       2.0
1              EA   yes     yes          2.5         1.5       1.0
2              AA    no      no          3.0         2.0       1.0

Or:

data2 = pd.DataFrame({'Unit of Measure' : ['EA', 'EA', 'AA'],
                     'round' : ['no', 'yes', 'no'],
                     'add_gst' : ['yes', 'yes', 'no'],
                     'Unit Charge' : [1.8, 2.5, 3.0],
                     'Time Units' : [1.0, 1.5, 2.0]})
print (data2)
  Unit of Measure round add_gst  Unit Charge  Time Units
0              EA    no     yes          1.8         1.0
1              EA   yes     yes          2.5         1.5
2              AA    no      no          3.0         2.0

m1 = (data2['Unit of Measure'] == 'EA') & (data2['round'] == 'no') & (data2['add_gst'] == 'yes')
m2 = (data2['Unit of Measure'] == 'EA') & (data2['round'] == 'yes') & (data2['add_gst'] == 'yes')

rounded = data2['Unit Charge'] * data2['Time Units']
rounded = rounded.round(2) # to two decimal places
rounded = rounded * 1.10 # add gst
rounded = rounded / 0.05 # round to nearest five cents - step 1 - divide by 0.05
rounded = rounded.round() # round to nearest five cents - step 2 - to nearest integer
rounded1 = rounded * 0.05 # round to nearest five cents - step 3 - multiply by 0.05
rounded2 = rounded * 0.08

data2['incl_gst'] = np.select([m1, m2], [rounded1, rounded2], default=1.0)
print (data2)
  Unit of Measure round add_gst  Unit Charge  Time Units  incl_gst
0              EA    no     yes          1.8         1.0      2.00
1              EA   yes     yes          2.5         1.5      6.56
2              AA    no      no          3.0         2.0      1.00
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • Thanks. So what if I want to have multiple conditions though. Would it be better in a function? For example an `elif` for `data2['Unit of Measure'] == 'EA') & (data2['round'] == 'no') & (data2['add_gst'] == 'no` – william3031 Dec 11 '19 at 05:28
  • Probably 12 `elif` – william3031 Dec 11 '19 at 05:30
  • @william3031 - added solution with apply and with [`numpy.select`](https://docs.scipy.org/doc/numpy/reference/generated/numpy.select.html), please check edit. – jezrael Dec 11 '19 at 05:42
  • So I think I just used `data` instead of `df` in the original function. Was that the only issue? – william3031 Dec 11 '19 at 05:55
  • @william3031 - It is necessary because you working with each row of dataframe separately, `df` is used because `def include_gst(df):` - there is used variable `df` – jezrael Dec 11 '19 at 05:57