3

I have a method which takes a pandas dataframe as an input:

def dfColumnFilter(df, columnFilter, columnName):
    ''' Returns a filtered DataFrame

    Keyword arguments: 
    df           :  DataFrame in which to apply the filter
    columnFilter :  The list of which to filter by
    columnName   :  The DataFrame column to apply the columnFilter to '''

    for column_filter in columnFilter:
        df=df[df[columnName] == column_filter]
        return df

The question is is how do I make this work for n columns?

ctrl-alt-delete
  • 3,696
  • 2
  • 24
  • 37

2 Answers2

3

You can use the *args keyword to pass a list of pairs:

def filter_df(df, *args):
    for k, v in args:
        df = df[df[k] == v]
    return df

It can be used like this:

df = pd.DataFrame({'a': [1, 2, 1, 1], 'b': [1, 3, 3, 3]})

>>> filter_df(df, ('a', 1), ('b', 2))
    a   b
2   1   3
3   1   3

Note

In theory, you could use **kwargs, which would have a more pleasing usage:

filter_df(df, a=1, b=2)

but then you could only use it for columns whose names are valid Python identifiers.

Edit

See comment below by @Goyo for a better implementation point.

Community
  • 1
  • 1
Ami Tavory
  • 74,578
  • 11
  • 141
  • 185
  • I think you can use the dictionary syntax for invalid identifiers:
    df = pd.DataFrame({'first one': [1, 2, 1, 1], 'second one': [1, 3, 3, 3]})
    filter_df(df, {'first one'=1, 'second one'=2})
    – Stop harming Monica Feb 01 '16 at 13:50
1

You can use as below

filtered_df = df[(df[column1]=='foo') & (df[column2]=='bar')]

and you can continue with & and parentesis statements.

Omrum Cetin
  • 1,320
  • 13
  • 17