Selecting rows from a Dataframe based on values in multiple columns in pandas

Question

This question is very related to another, and I'll even use the example from the very helpful accepted solution on that question. Here's the example from the accepted solution (credit to unutbu):

import pandas as pd
import numpy as np
df = pd.DataFrame({'A': 'foo bar foo bar foo bar foo foo'.split(),
                   'B': 'one one two three two two one three'.split(),
                   'C': np.arange(8), 'D': np.arange(8) * 2})
print(df)
#      A      B  C   D
# 0  foo    one  0   0
# 1  bar    one  1   2
# 2  foo    two  2   4
# 3  bar  three  3   6
# 4  foo    two  4   8
# 5  bar    two  5  10
# 6  foo    one  6  12
# 7  foo  three  7  14

print(df.loc[df['A'] == 'foo'])

yields

     A      B  C   D
0  foo    one  0   0
2  foo    two  2   4
4  foo    two  4   8
6  foo    one  6  12
7  foo  three  7  14

But what if I want to pick out all rows that include both 'foo' and 'one'? Here that would be row 0 and 6. My attempt at it is to try

print(df.loc[df['A'] == 'foo' and df['B'] == 'one'])

This does not work, unfortunately. Can anybody suggest a way to implement something like this? Ideally it would be general enough that there could be a more complex set of conditions in there involving and and or, though I don't actually need that for my purposes.

By the way, I've also linked this in a comment to the accepted solution on the original question so that others might find it also when accessing the original question. — Shane, Aug 01 '15 at 00:16

joris · Accepted Answer · 2015-07-31T23:22:50.617

22

There is only a very small change needed in your code: change the and with & (and add parentheses for correct ordering of comparisons):

In [104]: df.loc[(df['A'] == 'foo') & (df['B'] == 'one')]
Out[104]:
     A    B  C   D
0  foo  one  0   0
6  foo  one  6  12

The reason you have to use & is that this will do the comparison element-wise on arrays, while and expect to compare two expressions that evaluate to True or False.
Similarly, when you want the or comparison, you can use | in this case.

edited Jul 31 '15 at 23:22

answered Jul 31 '15 at 23:08

joris

133,120
36
247
202

Thanks @joris. I've come from Matlab and trained myself to use `and` and `or` instead of `&&` and `||` so it would have to be the case that what I'm trying to do requires something that looks more like my original Matlab thinking. In any case, I appreciate the explanation and that your answer works for assignment also. – Shane Aug 01 '15 at 00:15

score 4 · Answer 2 · answered Jul 31 '15 at 23:14

4

You can do this with tiny altering in your code:

print(df[df['A'] == 'foo'][df['B'] == 'one'])

Output:

     A    B  C   D
0  foo  one  0   0
6  foo  one  6  12

answered Jul 31 '15 at 23:14

Geeocode

5,705
3
20
34

2

Just note that as long as you are only selecting data this is OK, but once you want to assign a new value to such selection, this chained indexing will not work anymore. Therefore I would recommend to use `&` and index in one pass. – joris Jul 31 '15 at 23:22
Thanks very much for this George. While both your answer and that of joris suffice for my purposes, I accepted joris' answer because he explains why `and` doesn't work and his answer allows assignment also. Still, I appreciate your suggestion and upticked it accordingly. – Shane Aug 01 '15 at 00:12
@Shane My pleasure Shane. Agree, he explained your case with and/&. By the way, I see this operand difference first too, so I learnt something new as well. So it's all right, and thanks for the uptick. :) – Geeocode Aug 01 '15 at 00:28

Selecting rows from a Dataframe based on values in multiple columns in pandas

2 Answers2

Linked