2

I have a DataFrame in pandas that I'd like to select a subset of rows from based on the values of two columns.

test_df = DataFrame({'Topic' : ['A','A','A','B','B'], 'Characteristic' : ['Population','Other','Other','Other','Other'], 'Total' : [25, 22, 21, 20, 30]})

It works as expected and returns the first row when I use this code:

bool1 = test_df['Topic']=='A' 
bool2 = test_df['Characteristic']=='Population'

test_df[bool1 & bool2]

But when I try to do it all in one line as below,

test_df[test_df['Topic']=='A' & test_df['Characteristic']=='Population']

I get "TypeError: cannot compare a dtyped [object] array with a scalar of type [bool]"

Why? Is there a good way to do this in a single step?

eamcvey
  • 645
  • 6
  • 18

1 Answers1

5

You only need to add parentheses:

>>> test_df[(test_df['Topic']=='A') & (test_df['Characteristic']=='Population')]
  Characteristic Topic  Total
0     Population     A     25

Alternatively, you could use the query method, to avoid the repetition of test_df:

>>> test_df.query("Topic == 'A' and Characteristic == 'Population'")
  Characteristic Topic  Total
0     Population     A     25
DSM
  • 342,061
  • 65
  • 592
  • 494
  • I'm glad you included the query example. While it's 'only' syntactic sugar, it makes code *much* more readable. – JD Long Nov 25 '14 at 17:05