1

I have a pandas DataFrame. I would like to add a new column with a value of 1 for a new column, y, if the previous value of x is less than 50 and the current value is more than 50.

I get this error:

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

code:

import pandas as pd
import numpy as np

df = pd.DataFrame(np.random.randint(40,60,size=(10, 1)), columns=['x'])

df['y'] = 1 if (df['x'].shift(1) < 50) and (df['x'] > 50) else 0
Barmar
  • 741,623
  • 53
  • 500
  • 612
user2242044
  • 8,803
  • 25
  • 97
  • 164

2 Answers2

4

This is an old story. pandas offers you an overloaded version of the bitwise operators which you should use for vectorised OR/AND operations. Anyway, based on your code, an astype conversion would be more appropriate here.

In [139]: df['y'] = ((df['x'].shift(1) < 50) & (df['x'] > 50)).astype(np.int8)

In [140]: df
Out[140]:
    x  y
0  51  0
1  51  0
2  48  0
3  54  1
4  47  0
5  41  0
6  51  1
7  49  0
8  53  1
9  41  0

or less common option:

In [146]: df.eval("(x.shift() < 50 and x > 50) * 1", inplace=False, engine='python')
Out[146]:
0    0
1    0
2    0
3    1
4    0
5    0
6    1
7    0
8    1
9    0
Name: x, dtype: int32
MaxU - stand with Ukraine
  • 205,989
  • 36
  • 386
  • 419
3

and is an operator that Python classes can not override. It takes as input two expressions. It evaluates the first one. In case the truthiness of the first one is True, then it evaluates the latter and returns that result. In case the thruthiness is False, then it returns the result of the former expression.

What you here can use are boolean operators & (logical and) and | (logical or). We can rewrite it thus to:

df['y'] = ((df['x'].shift(1) < 50) & (df['x'] > 50)).astype(int)

Here we use astype to convert the row of booleans to a row of integers. False maps to 0 and True maps to 1.

Willem Van Onsem
  • 443,496
  • 30
  • 428
  • 555