0

I have the following code.

import pandas as pd
data = {'income_bracket':['<=50k', '<=75k', '<=125k', '>1(25k']}
df = pd.DataFrame(data)
def label_fix(label):
    if df['income_bracket']== '<=50K':
        return 0
    else:
        return 1
df['income_bracket']=df['income_bracket'].apply(label_fix)

When I run the code, I get the following error.

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

I would really appreciate any help here. Thanks

DYZ
  • 55,249
  • 10
  • 64
  • 93
Kartick
  • 65
  • 6
  • 2
    The problem is with your function. Use ```label ==`` for the comparison, not the entire column. – Eric Truett May 22 '20 at 16:44
  • This question should not have been closed for duplicate, at least not with a pointer to the linked answer. `pandas` throws that error for a lot of different reasons and bitwise vs. logical operations has nothing to do with this one – Randy May 22 '20 at 17:00

2 Answers2

0

Your bug is that you don't use the argument passed to label_fix, but really you should not use apply in Pandas, it's dead slow.

Instead, do it in vectorized form:

df['income_bracket'] = np.where(df['income_bracket'] == '<=50K', 0, 1)

If you have more than two cases you can use np.select() instead of np.where().

John Zwinck
  • 239,568
  • 38
  • 324
  • 436
0
import pandas as pd
data = {'income_bracket':['<=50k', '<=75k', '<=125k', '>1(25k']}
df = pd.DataFrame(data)
def label_fix(label):
    if label== '<=50K':
         return 0
    else:
         return 1
df['income_bracket']=df['income_bracket'].apply(label_fix)

This is the corrected version of your code snippet. What you were doing is calling df['income_bracket'] which is a series. Instead, you should use 'label' passed as argument to label_fix() in apply(). Do change '<=50K' to '<=50k' else all values will be 1

Mehul Gupta
  • 1,829
  • 3
  • 17
  • 33