4

I know following error

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

has been asked a long time ago.

However, I am trying to create a basic function and return a new column with df['busy'] with 1 or 0. My function looks like this,

def hour_bus(df):
    if df[(df['hour'] >= '14:00:00') & (df['hour'] <= '23:00:00')&\
             (df['week_day'] != 'Saturday') & (df['week_day'] != 'Sunday')]:
         return df['busy'] == 1
     else:
         return df['busy'] == 0 

I can execute the function, but when I call it with the DataFrame, I get the error mentioned above. I followed the following thread and another thread to create that function. I used & instead of and in my if clause.

Anyhow, when I do the following, I get my desired output.

df['busy'] = np.where((df['hour'] >= '14:00:00') & (df['hour'] <= '23:00:00') & \
                        (df['week_day'] != 'Saturday') & (df['week_day'] != 'Sunday'),'1','0')

Any ideas on what mistake am I making in my hour_bus function?

i.n.n.m
  • 2,936
  • 7
  • 27
  • 51
  • 1
    `np.where` understands boolean arrays `if` doesn't it expects a scalar boolean result, so it becomes ambiguous hence the error. It also doesn't make much sense here, changing `and` to `&` is irrelevant, you want to use the boolean mask to mask which rows to overwrite – EdChum Jul 17 '17 at 15:33
  • @EdChum thank you for quick response, i understand that and `np.where` works fine, but my error is in the function `hour_bus`. Any thoughts why? – i.n.n.m Jul 17 '17 at 15:36
  • @EdChum Quick question, as I have in the function, there is no problem with returning a new column inside a function right? – i.n.n.m Jul 17 '17 at 15:38
  • 2
    You're using `if` which is the principle problem, if you added `all()`, `any()` etc then it becomes a scalar value, also returning `df['busy'] == 1` doesn't make sense either, that just returns a mask for the entire column – EdChum Jul 17 '17 at 15:38

1 Answers1

3

The

(df['hour'] >= '14:00:00') & (df['hour'] <= '23:00:00')& (df['week_day'] != 'Saturday') & (df['week_day'] != 'Sunday')

gives a boolean array, and when you index your df with that you'll get a (probably) smaller part of your df.

Just to illustrate what I mean:

import pandas as pd

df = pd.DataFrame({'a': [1,2,3,4]})
mask = df['a'] > 2
print(mask)
# 0    False
# 1    False
# 2     True
# 3     True
# Name: a, dtype: bool
indexed_df = df[mask]
print(indexed_df)
#    a
# 2  3
# 3  4

However it's still a DataFrame so it's ambiguous to use it as expression that requires a truth value (in your case an if).

bool(indexed_df)
# ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

You could use the np.where you used - or equivalently:

def hour_bus(df):
    mask = (df['hour'] >= '14:00:00') & (df['hour'] <= '23:00:00')& (df['week_day'] != 'Saturday') & (df['week_day'] != 'Sunday')
    res = df['busy'] == 0                             
    res[mask] = (df['busy'] == 1)[mask]  # replace the values where the mask is True
    return res

However the np.where will be the better solution (it's more readable and probably faster).

MSeifert
  • 145,886
  • 38
  • 333
  • 352
  • thank you for your answer and explanation. One quick question just to understand, this means that as you and #EdChum noted in the comment, I can not create a function and call, is that correct? – i.n.n.m Jul 17 '17 at 15:48
  • 1
    I just left out the function part because it's not essential to the question and answer. You can always wrap it inside a function :) – MSeifert Jul 17 '17 at 15:49