Filtering multiple columns Pandas

Question

I have a method which takes a pandas dataframe as an input:

def dfColumnFilter(df, columnFilter, columnName):
    ''' Returns a filtered DataFrame

    Keyword arguments: 
    df           :  DataFrame in which to apply the filter
    columnFilter :  The list of which to filter by
    columnName   :  The DataFrame column to apply the columnFilter to '''

    for column_filter in columnFilter:
        df=df[df[columnName] == column_filter]
        return df

The question is is how do I make this work for n columns?

score 3 · Accepted Answer · edited May 23 '17 at 12:34

3

You can use the *args keyword to pass a list of pairs:

def filter_df(df, *args):
    for k, v in args:
        df = df[df[k] == v]
    return df

It can be used like this:

df = pd.DataFrame({'a': [1, 2, 1, 1], 'b': [1, 3, 3, 3]})

>>> filter_df(df, ('a', 1), ('b', 2))
    a   b
2   1   3
3   1   3

Note

In theory, you could use **kwargs, which would have a more pleasing usage:

filter_df(df, a=1, b=2)

but then you could only use it for columns whose names are valid Python identifiers.

Edit

See comment below by @Goyo for a better implementation point.

edited May 23 '17 at 12:34

Community

1
1

answered Feb 01 '16 at 13:02

Ami Tavory

74,578
11
141
185

I think you can use the dictionary syntax for invalid identifiers:
df = pd.DataFrame({'first one': [1, 2, 1, 1], 'second one': [1, 3, 3, 3]})
filter_df(df, {'first one'=1, 'second one'=2}) – Stop harming Monica Feb 01 '16 at 13:50

score 1 · Answer 2 · answered Dec 10 '19 at 08:21

1

You can use as below

filtered_df = df[(df[column1]=='foo') & (df[column2]=='bar')]

and you can continue with & and parentesis statements.

answered Dec 10 '19 at 08:21

Omrum Cetin

1,320
13
17

Filtering multiple columns Pandas

2 Answers2

Linked