1

I am trying to use multiple conditions in my pandas, but I am getting the following error:

ValueError: The truth value of a Series is ambiguous. Use a.empty, 
a.bool(), a.item(), a.any() or a.all().

As per this thread I replaced the and with a bitwise operator, but I am still getting the error.

import pandas as pd

d1 = {'Year': [2019,2019,2019,2019,2019], 'Week':[1,2,4,6,7], 'Value': 
[20,40,60,75,90]}

df1 = pd.DataFrame(data=d1)

if (df1['Year'] == df1['Year'].shift(-1)) & \
 (df1['Week'] == df1['Week'].shift(-1)):
    print('Yes')
else:
    print('No')

What might I be doing wrong here?

Shantanu
  • 839
  • 13
  • 27
  • You're trying to use regular python boolean checks with Series objects. You can't do that. You're asking for an implicit boolean: What should Python say to `if [1, 2, 3] and [2, 3, 4]:`? That'd be `True` but it wouldn't tell you anything about individual rows (values in the list). – roganjosh Feb 01 '19 at 22:55
  • What is your actual desired output? – roganjosh Feb 01 '19 at 22:56
  • Thanks, @roganjosh. I get it now. I want to output 'No' for each row in the dataframe. – Shantanu Feb 01 '19 at 22:59
  • As in, a new column? If you want to iterate the DF and print things row-wise then it really doesn't fit well with Pandas and could be done with just regular lists. – roganjosh Feb 01 '19 at 22:59
  • Yes. A new column. – Shantanu Feb 01 '19 at 23:00

3 Answers3

3

The actual comparison check is not incorrect but doesn't work with regular Python if because Pandas works in a vectorized manner. As I said in the comments in regards to the error:

What should Python say to if [1, 2, 3] and [2, 3, 4]:? That'd be True but it wouldn't tell you anything about individual rows (values in the list)

Instead, use np.where.

df1['comparison'] = np.where((df1['Year'] == df1['Year'].shift(-1)) & 
                             (df1['Week'] == df1['Week'].shift(-1)), 'Yes', 'No')
roganjosh
  • 12,594
  • 4
  • 29
  • 46
  • Is np.where always recommended to be used with pandas for conditional statement? – Shantanu Feb 01 '19 at 23:06
  • 1
    @Shantanu if you're trying to do row-wise comparisons without a `for` loop (which you should try avoid if possible), then yes. But it's tough to answer that because pandas and numpy are _huge_ and there could be all sorts of tricks. If you're just starting out then yes, the simplest way is to think of it like this. – roganjosh Feb 01 '19 at 23:07
  • If you have one condition, it is the fastest way, specially if you want to yield different values depending on the result of the condition – yatu Feb 01 '19 at 23:07
  • @Wen-Ben wouldn't that drop into a python `for` loop on `map`? – roganjosh Feb 01 '19 at 23:27
  • @Wen-Ben but those docs are also suggesting `apply` which definitely does drop vectorization and go row-wise in "python time" so I suspect it would be much slower. I wouldn't personally go for that approach myself but you're free to post as an alternative :) – roganjosh Feb 01 '19 at 23:31
3

You could use np.where which will yield Yes or No according to whether the condition is met or not:

c1 = df1.Year == df1.Year.shift(-1)
c2 = df1.Week == df1.Week.shift(-1)
df1.loc[:,'is_repeated'] = np.where(c1&c2, 'Yes', 'No')

   Year  Week  Value    is_repeated
0  2019     1     20          No
1  2019     2     40          No
2  2019     4     60          No
3  2019     6     75          No
4  2019     7     90          No
yatu
  • 86,083
  • 12
  • 84
  • 139
0

Well I am not very sure about the bitwise operator, but to compare arrays you can use the equals method and logical and together, I think that would be easier.

For example, you can modify the loop condition to:

if df1['Year'].equals(df1['Year'].shift(-1)) and df1['Week'].equals(df1['Week'].shift(-1)):
Koralp Catalsakal
  • 1,114
  • 8
  • 11