0

Edit: Solutions posted in this notebook. Special thanks to Étienne Célèry and ifly6!


I am trying to figure out how to beat the feared error:

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

d = {
    'nickname': ['bobg89', 'coolkid34','livelaughlove38'], 
    'state': ['NY', 'CA','TN'],
    'score': [100, 200,300]
}
df = pd.DataFrame(data=d)
df_2 = df.copy() #for use in the second non-lambda part
print(df)

And this outputs:

          nickname state  score
0           bobg89    NY    100
1        coolkid34    CA    200
2  livelaughlove38    TN    300

Then the goal is to add 50 to the score if they are from NY.

def add_some_love(state_value,score_value,name):
     if state_value == name:
          return score_value + 50
     else:
          return score_value

Then we can apply that function with a lambda function.

df['love_added'] = df.apply(lambda x: add_some_love(x.state, x.score, 'NY'), axis=1)
print(df)

And that gets us:

          nickname state  score  love_added
0           bobg89    NY    100         150
1        coolkid34    CA    200         200
2  livelaughlove38    TN    300         300

And here is where I tried writing it, without the lambda, and that's where I get the error.

It seems like @MSeifert's answer here explains why this happens (that the function is looking at a whole column instead of a row in a column, but I also thought passing axis = 1 into the .apply() method would apply the function row-wise, and fix the problem).

So I then do this:

df2['love_added'] = df2.apply(add_some_love(df2.state, df2.score, 'NY'), axis=1)
print(df2)

And then you get the error:

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

So I've tried these solutions, but I can't seem to figure out how to rewrite the add_some_love() function so that it can run properly without the lambda function.

Does anyone have any advice?

Thanks so much for your time and consideration.

George Hayward
  • 485
  • 5
  • 12

2 Answers2

2

What you could do instead is use np.where:

df['score'] = np.where(df['state'] == 'NY', df['score'] + 50, df['score'])

This would produce the same outcome as your applied function while also being much more performant.


The issue you have with the non-use of the lambda function is that you are not actually passing the rows to your function. What you're actually passing is the whole column df['score'], because that's what you told the computer to do.

What's going on in your function is the computer asking:

# if state_value == name ...
if df['score'] == 'NY':
    ...

Which naturally will raise your error, because df['score'] == 'NY' is a series of boolean variables and not a single boolean variable, as needed for the if statement.

ifly6
  • 5,003
  • 2
  • 24
  • 47
  • Thanks so much @ifly6, and I totally agree, but I'm trying to figure out how to jam this into a .apply() method, and not use the lambda function (I've already gotten that to work). Do you know how I fix the `add_some_love()` function, so I can give the computer the logic one row at a time? I thought by passing `axis=1` into the `.apply()` method it was already doing that, but I guess not – George Hayward Mar 24 '22 at 19:29
  • and totally agree that `np.where()` is most elegant... just trying to see if there's a way to solve the issue for the .apply() method.... I've always also learned that in 99.99% times if you can right the thing in lambda syntax, then you can write the thing in def f(x) syntax, but I can't seem to find a workaround at the moment for the 'split the column into one row at a time' issue – George Hayward Mar 24 '22 at 19:32
  • Or maybe there's no way out of using the `lambda` function, because it's even called in the original Pandas documentation [here](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.apply.html) – George Hayward Mar 24 '22 at 19:39
  • 1
    Oh, you can, you just also can't do it with parameters in that framework. Pass the function itself `foo` with signature `foo(the_whole_row)`. Then extract the state and scores etc in the function, do your changes, and pass the results back. It's not wholly necessary. – ifly6 Mar 25 '22 at 15:25
1

Your add_some_love function need a string input in order to execute the comparison if state_value == name. When you apply a lambda function over a DataFrame, you can pass every cell to the function, instead of the whole Series.

You can't use that exact add_some_love function without a lambda function.

If you still want to use apply(), try this function:

def add_some_love(row, name):
   if row.state == name:
       row.score = row.score + 50
   return row

df_2 = df_2.apply(add_some_love, axis=1, args=('NY',))

However, this should be the fastest and most efficient:

FILTER = df_2['state'] == 'NY'
df_2.loc[FILTER, 'score'] = df_2.loc[FILTER, 'score'] + 50
marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
  • No rush on this, but syntax-wise do you now why what you wrote works: `.apply(add_some_love, axis=1, args=('NY',))` whereas this won't work: `.apply(add_some_love(name ='NY'), axis=1)` how does Pandas know that the first argument is the row, and the next is 'NY' in your original masterpiece? No rush on this b/c you have cracked it! the other thing I'm trying to figure out is: is there a way to write your function in such a way that you can create a new column... but this definitely shows me why you'd want to use `lambda` here if you can... thank you! – George Hayward Mar 24 '22 at 19:53
  • Found a way: ` def add_some_love_non_lambda_new(row, new_col, name, add): if row.state == name: row[new_col] = row.score + add return row ` – George Hayward Mar 24 '22 at 20:03
  • Thanks, just glad I can help. `.apply(add_some_love(name='NY')) ` won't work because you are not passing `df` (remember the function has 2 arguments). Have a look at the documentation: https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.apply.html You can see that `.apply` needs a function (without calling it, just passing the function as an object); `args` are separate. There you include any arguments **besides** the first one. `add_some_love` has 2 arguments (the first one being `df`), so `apply` passes `df` automatically. The second one (`name`) needs to be included manually. – Étienne Célèry Mar 24 '22 at 20:06
  • 1
    Thanks so much - I've loaded this all and credited you in this notebook -> https://github.com/ghayward/pandas_apply_lambda_or_not/blob/main/Pandas%20Apply%20and%20Lambda%20Usage.ipynb – George Hayward Mar 24 '22 at 20:15