0

I have a dataframe where one of the columns of type int is storing a binary flag pattern:

import pandas as pd

df = pd.DataFrame({'flag': [1, 2, 4, 5, 7, 3, 9, 11]})

I tried selecting rows with value matching 4 the way it is typically done (with binary and operator):

df[df['flag'] & 4]

But it failed with:

KeyError: "None of [Int64Index([0, 0, 4, 4, 4, 0, 0, 0], dtype='int64')] are in the [columns]"

How to actually select rows matching binary pattern?

sophros
  • 14,672
  • 11
  • 46
  • 75

2 Answers2

1

The bitwise-flag selection works as you’d expect:

>>> df['flag'] & 4
0    0
1    0
2    4
3    4
4    4
5    0
6    0
7    0
Name: flag, dtype: int64

However if you pass this to df.loc[], you’re asking to get the indexes 0 and 4 repeatedly, or if you use df[] directly you’re asking for the column that has Int64Index[...] as column header.

Instead, you should force the conversion to a boolean indexer:

>>> (df['flag'] & 4) != 0
0    False
1    False
2     True
3     True
4     True
5    False
6    False
7    False
Name: flag, dtype: bool
>>> df[(df['flag'] & 4) != 0]
   flag
2     4
3     5
4     7
Cimbali
  • 11,012
  • 1
  • 39
  • 68
0

Even though in Pandas & or | is used as a logical operator to specify conditions but at the same time using a Series as an argument to allegedly logical operator results not in a Series of Boolean values but numbers.

Knowing that you can use any of the following approaches to select rows based on a binary pattern:

  • Since result of <int> & <FLAG> is always <FLAG> then you can use:

    df[df['flag'] & 4 == 4]
    

which (due to the precedence of operators) evaluates as:

  df[(df['flag'] & 4) == 4]
  • alternatively you can use apply and map the result directly to a bool:

    df[df['flag'].apply(lambda v: bool(v & FLAG))]
    

But this does look very cumbersome and is likely to be much slower.

In either cases, the result is as expected:

    flag
2   4
3   5
4   7
sophros
  • 14,672
  • 11
  • 46
  • 75