0

I have a pandas Dataframe with columns col1 and col2. I am trying to build col3 as:

df["col3"] = (df["col1"] == 1) | (df["col2"] ==1)

and it works. I tried to rewrite it as:

df["col3"] = any([df[c] == 1 for c in ["col1", "col2"]])

but I get the infamous ValueError: The truth value of a series is ambiguous ...

I even tried to rewrite any( .. ) as pd.Series( .. ).any(), but it did not work.

How would you do it?

Ch3steR
  • 20,090
  • 4
  • 28
  • 58
marco
  • 569
  • 1
  • 4
  • 19
  • That's because `df[c] == 1` gives a Series object. You can't convert a Series object to bool. You'd get the same error when you run `bool(df['col1']==1)`. `any` checks if there's any *truthy* value in the iterable. – Ch3steR Nov 19 '21 at 09:23
  • *How would you do it?* The first option is pythonic, idiomatic and vectorized. I'd go with the first one. – Ch3steR Nov 19 '21 at 09:27
  • In addition to @Ch3steR statements, result of `any(...)` will be just one boolian! One `True` or one `False`. – Shayan Nov 19 '21 at 09:33

2 Answers2

0

SImpliest is compare all columns filtered in list for boolean DataFrame and add DataFrame.any:

(df[["col1", "col2"]] == 1).any(axis=1)

Your solution should be changed by np.logical_or.reduce:

np.logical_or.reduce([df[c] == 1 for c in ["col1", "col2"]])

Or a bit overcomplicated:

pd.concat([df[c] == 1 for c in ["col1", "col2"]], axis=1).any(axis=1)
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
-1

As was already explained in the comments, the any function implicitly tries (and fails) to convert a series to bool

If you want to have something similar to your second code snippet, you can use numpy's any function as this supports only a single axis.

import numpy
np.any([df[c] == 1 for c in ["col1", "col2"]], axis=0)

Alternatively, you could also extend your first code snippet to more columns by using reduce

In [6]: import functools
In [7]: functools.reduce(lambda a, b: a | b, [(df[c] == 1) for c in ['col1', 'col2']])
maow
  • 2,712
  • 1
  • 11
  • 25