2

I have a dataframe:

df = pd.DataFrame(np.random.randint(0,100,size=(15, 4)), columns=list('ABCD'))

I would like to create another BOOL column or YES/NO column based on the sum of column A and B > 150

I am trying a generator kind of solution:

df['Truth'] = ['Yes' for  i in df.columns.values if (df.A+df.B > 150)]

I know this does not work but I keep getting another error

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

How do I code this and what does this error mean?

Trenton McKinney
  • 56,955
  • 33
  • 144
  • 158
John Doe
  • 637
  • 2
  • 7
  • 14

1 Answers1

3

How to get a column of Boolean values:

  • (df.A + df.B) > 150 generates a pandas.Series of Boolean values. Assign it to a column name.
import pandas as pd
import numpy as np

# sample data
np.random.seed(2)
df = pd.DataFrame(np.random.randint(0, 100, size=(15, 4)), columns=list('ABCD'))

# create the Boolean column
df['Truth'] = (df.A + df.B) > 150

# display(df)
     A   B   C   D  Truth
0   40  15  72  22  False
1   43  82  75   7  False
2   34  49  95  75  False
3   85  47  63  31  False
4   90  20  37  39  False
5   67   4  42  51  False
6   38  33  58  67  False
7   69  88  68  46   True
8   70  95  83  31   True
9   66  80  52  76  False
10  50   4  90  63  False
11  79  49  39  46  False
12   8  50  15   8  False
13  17  22  73  57  False
14  90  62  83  96   True

What does this error mean:

  • What is shown in the question is a list-comprehension, not a generator.
  • (df.A + df.B) returns a pandas.Series, which can be compared to a value like 150
    • The issue with the list comprehension is if (df.A+df.B > 150), which causes the ValueError because there is a series, not just a single Boolean.
  • Another issue is df.columns.values is just a list of the column names.
  • See Truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all() for further details on the error.
Trenton McKinney
  • 56,955
  • 33
  • 144
  • 158