The truth value of a Series is ambiguous - Error when calling a function

Question

I know following error

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

has been asked a long time ago.

However, I am trying to create a basic function and return a new column with df['busy'] with 1 or 0. My function looks like this,

def hour_bus(df):
    if df[(df['hour'] >= '14:00:00') & (df['hour'] <= '23:00:00')&\
             (df['week_day'] != 'Saturday') & (df['week_day'] != 'Sunday')]:
         return df['busy'] == 1
     else:
         return df['busy'] == 0

I can execute the function, but when I call it with the DataFrame, I get the error mentioned above. I followed the following thread and another thread to create that function. I used & instead of and in my if clause.

Anyhow, when I do the following, I get my desired output.

df['busy'] = np.where((df['hour'] >= '14:00:00') & (df['hour'] <= '23:00:00') & \
                        (df['week_day'] != 'Saturday') & (df['week_day'] != 'Sunday'),'1','0')

Any ideas on what mistake am I making in my hour_bus function?

`np.where` understands boolean arrays `if` doesn't it expects a scalar boolean result, so it becomes ambiguous hence the error. It also doesn't make much sense here, changing `and` to `&` is irrelevant, you want to use the boolean mask to mask which rows to overwrite — EdChum, Jul 17 '17 at 15:33
@EdChum thank you for quick response, i understand that and `np.where` works fine, but my error is in the function `hour_bus`. Any thoughts why? — i.n.n.m, Jul 17 '17 at 15:36
@EdChum Quick question, as I have in the function, there is no problem with returning a new column inside a function right? — i.n.n.m, Jul 17 '17 at 15:38
You're using `if` which is the principle problem, if you added `all()`, `any()` etc then it becomes a scalar value, also returning `df['busy'] == 1` doesn't make sense either, that just returns a mask for the entire column — EdChum, Jul 17 '17 at 15:38

MSeifert · Accepted Answer · 2017-07-17T15:50:21.003

The

(df['hour'] >= '14:00:00') & (df['hour'] <= '23:00:00')& (df['week_day'] != 'Saturday') & (df['week_day'] != 'Sunday')

gives a boolean array, and when you index your df with that you'll get a (probably) smaller part of your df.

Just to illustrate what I mean:

import pandas as pd

df = pd.DataFrame({'a': [1,2,3,4]})
mask = df['a'] > 2
print(mask)
# 0    False
# 1    False
# 2     True
# 3     True
# Name: a, dtype: bool
indexed_df = df[mask]
print(indexed_df)
#    a
# 2  3
# 3  4

However it's still a DataFrame so it's ambiguous to use it as expression that requires a truth value (in your case an if).

bool(indexed_df)
# ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

You could use the np.where you used - or equivalently:

def hour_bus(df):
    mask = (df['hour'] >= '14:00:00') & (df['hour'] <= '23:00:00')& (df['week_day'] != 'Saturday') & (df['week_day'] != 'Sunday')
    res = df['busy'] == 0                             
    res[mask] = (df['busy'] == 1)[mask]  # replace the values where the mask is True
    return res

However the np.where will be the better solution (it's more readable and probably faster).

thank you for your answer and explanation. One quick question just to understand, this means that as you and #EdChum noted in the comment, I can not create a function and call, is that correct? — i.n.n.m, Jul 17 '17 at 15:48
I just left out the function part because it's not essential to the question and answer. You can always wrap it inside a function :) — MSeifert, Jul 17 '17 at 15:49

The truth value of a Series is ambiguous - Error when calling a function

1 Answers1

Related