How to fllter a pandas DataFrame by multiple columns?

Question

I'm trying to subset a pandas DataFrame in python based on two logical statements

i.e.

d = {'col1': [1, 2], 'col2': [3, 4]}
df = pd.DataFrame(data=d)
df[df.col1 = 1 and df.col2 = 3]

but I'm getting invalid syntax on line3.

Is there a way to do this in one line?

This question is duplicated. https://stackoverflow.com/questions/22086116/how-do-you-filter-pandas-dataframes-by-multiple-columns — Keiku, Feb 03 '18 at 06:31

score 4 · Accepted Answer · edited Feb 03 '18 at 04:17

4

You need to use logical operators. == is equals for returning boolean, = is setting a value.

Try:

df[(df.col1 == 1) & (df.col2 == 3)]

edited Feb 03 '18 at 04:17

AChampion

29,683
4
59
75

answered Feb 03 '18 at 04:11

Matt W.

3,692
2
23
46

1

You need to add `()` around each expression, or this has the same error. Note also the use of the bitwise `&` vs. the logical `and` operator. – AChampion Feb 03 '18 at 04:16

Diogo Cosin · Answer 2 · 2018-02-03T14:52:49.690

1

Disclaimer: as mentioned by @jp_data_analysis and pandas docs, the following solution is not the best one given it uses chained indexing. Please refer to Matt W. and AChampion solution.

An alternative one line solution.

>>> d = {'col1': [1, 2, 1], 'col2': [3, 4, 2]}
>>> df = pd.DataFrame(data=d)

>>> df[df.col1==1][df.col2==3]

   col1  col2
0     1     3

I have added a third row, with 'col1'=1 and 'col2'=2, so we can have an extra negative test case.

edited Feb 03 '18 at 14:52

answered Feb 03 '18 at 04:16

Diogo Cosin

107
11

1

downvoted - chained indexing is widely discouraged, see [pandas docs](https://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy) for an explanation why. – jpp Feb 03 '18 at 11:58
You are right @jp_data_analysis. I have added this information to the original answer. – Diogo Cosin Feb 03 '18 at 14:54

How to fllter a pandas DataFrame by multiple columns?

2 Answers2