0

Given a DataFrame df, if one wanted to filter the DataFrame to rows that have column_4 between 45 and 48, one might try the following code snippet:

import pandas as pd

df = pd.DataFrame({f'column_{i}': range(i*10, 10*i+10) for i in range(10)})

df2 = df[45 < df['column_4'] < 48]

However, this code yields a ValueError, because The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all(). One instead must do

import pandas as pd

df = pd.DataFrame({f'column_{i}': range(i*10, 10*i+10) for i in range(10)})

df2 = df[(45 < df['column_4']) & (df['column_4'] < 48)]

Why is this the case - and is this a hacky way to filter a DataFrame?

This question has a seemingly partial answer here, however that does not seem to address specifically why 45 < a < 48 does not work.

Seabody
  • 1,197
  • 1
  • 12
  • 27
  • Does not answer your question, but you can simplify your method here with: `df['column_4'].between(45, 48, inclusive=False)` – Erfan Jan 26 '21 at 00:02
  • [link](https://pandas.pydata.org/docs/user_guide/gotchas.html#using-if-truth-statements-with-pandas) may help – sammywemmy Jan 26 '21 at 00:06
  • Thanks, it pointed me in the right direction. Writing up the answer as a self-comment now. – Seabody Jan 26 '21 at 01:34

1 Answers1

0

It's already been asked, I was just searching the wrong terms.

The short answer is that chained relational operators such as 1 < a < 4 are expanded internally as (1 < a) and (a < 4). and and or cannot be overridden in Python -- see PEP335 and guido's post to the mailing list -- which means numpy can't use chained relational operators as masks. Thus, neither can Pandas.

Seabody
  • 1,197
  • 1
  • 12
  • 27