0

I'm working on a dataframe that needs to create a large amount of flags, depending on multiple conditions. I'm using np.where but now I'm running into this error

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

For replicability and simplicity, I'm only sharing the part of the code that produces the error together with the columns that are used. Dataframe being used:

     Data  Uniques  day_a1  day_a2  day_a3
0       1        1       3     NaN     NaN
1       2        2      14    15.0     NaN
2       2        1      10    10.0     NaN
3       3        1      10    10.0    10.0
802     2        2      12     NaN    29.0
806     1        1      29     NaN     NaN

Code that generates the error:

df['flag_3.3.3.1.1'] = np.where(
    (
        (df['Data'] == 3) & 
        (df['day_a1'] != 10) & 
        (df['Uniques'] == 3) & #I ran this separately and it was fine
        (df['day_a1'] > 27 or df['day_a1'] < 4).any()),'flag',np.nan)

I seem to still have issues after passying .any() after the or.

Celius Stingher
  • 17,835
  • 6
  • 23
  • 53
  • http://idownvotedbecau.se/nodebugging/ Split your long, long statement into parts; if necessary, multiple statements. And see what each part produces. – ivan_pozdeev Oct 08 '19 at 21:36
  • 1
    That's a *really* complicated set of operations you have there *inside* your function call. Have you considered splitting it into multiple statements as preparation? Would help us readers and yourself to get a (much) better idea of what's actually going on – Energya Oct 08 '19 at 21:36
  • Who is the magician that solved this? Thanks a lot! I wasn't sure to how to split them without breaking the code (I honestly tried). – Celius Stingher Oct 08 '19 at 21:40
  • `a= (df['Data'] == 3)`; `b=(df['day_a1'] != 10)`; etc; `x=a & b & c `; `np.where(x, '3.3.3.1.1', nan)`. – ivan_pozdeev Oct 08 '19 at 21:43
  • Editing to capture the error! – Celius Stingher Oct 08 '19 at 21:43
  • 1
    @CeliusStingher You're welcome :) If line-breaking code manually is too tricky, maybe look into running [`black`](https://black.readthedocs.io/en/stable/) on your code. Although writing code such that lines don't end up longer than 80-100 characters to begin with is usually a good start too – Energya Oct 08 '19 at 21:51

1 Answers1

3

Try replacing

(df['day_a1'] > 27 or df['day_a1'] < 4)

by

((df['day_a1'] > 27) | (df['day_a1'] < 4))

Note the use of | and the additional parenthesis for the precedence.

Ami Tavory
  • 74,578
  • 11
  • 141
  • 185