0

I'm trying to subset a pandas DataFrame in python based on two logical statements

i.e.

d = {'col1': [1, 2], 'col2': [3, 4]}
df = pd.DataFrame(data=d)
df[df.col1 = 1 and df.col2 = 3]

but I'm getting invalid syntax on line3.

Is there a way to do this in one line?

Keiku
  • 8,205
  • 4
  • 41
  • 44
Zmonk
  • 3
  • 3
  • You confused assignment `=` and comparison `==`. – DYZ Feb 03 '18 at 04:40
  • 1
    This question is duplicated. https://stackoverflow.com/questions/22086116/how-do-you-filter-pandas-dataframes-by-multiple-columns – Keiku Feb 03 '18 at 06:31

2 Answers2

4

You need to use logical operators. == is equals for returning boolean, = is setting a value.

Try:

df[(df.col1 == 1) & (df.col2 == 3)]
AChampion
  • 29,683
  • 4
  • 59
  • 75
Matt W.
  • 3,692
  • 2
  • 23
  • 46
  • 1
    You need to add `()` around each expression, or this has the same error. Note also the use of the bitwise `&` vs. the logical `and` operator. – AChampion Feb 03 '18 at 04:16
1

Disclaimer: as mentioned by @jp_data_analysis and pandas docs, the following solution is not the best one given it uses chained indexing. Please refer to Matt W. and AChampion solution.

An alternative one line solution.

>>> d = {'col1': [1, 2, 1], 'col2': [3, 4, 2]}
>>> df = pd.DataFrame(data=d)

>>> df[df.col1==1][df.col2==3]

   col1  col2
0     1     3

I have added a third row, with 'col1'=1 and 'col2'=2, so we can have an extra negative test case.

Diogo Cosin
  • 107
  • 11
  • 1
    downvoted - chained indexing is widely discouraged, see [pandas docs](https://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy) for an explanation why. – jpp Feb 03 '18 at 11:58
  • You are right @jp_data_analysis. I have added this information to the original answer. – Diogo Cosin Feb 03 '18 at 14:54