Subset dataframe with dynamic conditions using various number of columns as arguments

Question

Suppose I have a dataframe:

df = pd.DataFrame(np.random.randint(0, 100, size=(100, 7)), columns=list('ABCDEFG'))

that I need to pull a subset of df based on a series of conditions on selected columns but the columns are to be tuned. With said, the conditions are 'dynamic' that give various subsets, such as (to make the scenario easy, let's use all &):

dfs = df[(df.A > 10) & (df.C > 5) & (df.E < 30)]
dfs = df[(df.C > 5) & (df.D < 0)]
dfs = df[(df.B > 20) & (df.E < 30) & (df.F > 7) & (df.G < 15)]
...

Question is, is it possible to write a function that takes the columns and their compared values as some kind of **kwargs that returns different subset with different column list? Many thanks!

Please note sub-setting is not the main issue here. It's how we pass different combinations of different columns and their corresponding compared values as arguments.

This could help https://stackoverflow.com/a/46165056/6692898, but maybe include some pseudocode of your expected function — RichieV, Sep 13 '20 at 01:25
@RichieV thanks for the link. sub-setting is not the main issue here. but how do we pass different combinations of different columns and compared values within the queries as arguments? — Rock, Sep 13 '20 at 01:29

score 1 · Accepted Answer · answered Sep 13 '20 at 01:35

One way to accomplish this would be to pass a tuple containing the comparator and the target value, as such:

>>> def apply_filters(df, **kwargs):
...   for col in kwargs.keys():
...     df = df.loc[kwargs[col][0](df[col], kwargs[col][1])]
...   return df

>>> apply_filters(df, A=(pd.Series.__lt__, 10), C=(pd.Series.__lt__, 10))
    A   B  C   D   E   F   G
52  0  22  2  35  81  56  10
54  2  83  5   7  91  38  79

score 1 · Answer 2 · answered Sep 13 '20 at 01:36

1

IIUC, you can do something along those lines

def foo(df, *args):
    r=True
    for (i,arg) in enumerate(args):
        r = r & df[arg[0]].isin(arg[1])
    return df[r]

example:

foo(df, ['A',[51,83]],['B',[38,89]],['C',[45,27]])

will yield (for my random values)

    A   B   C   D   E   F   G
0  51  38  45  80  64   7  17
1  51  89  27  86  22  82  70

you can improve foo with this reference

answered Sep 13 '20 at 01:36

adhg

10,437
12
58
94

It's not 100% what I expect but thanks for answering! – Rock Sep 13 '20 at 01:45

score 1 · Answer 3 · answered Sep 13 '20 at 01:46

1

May just do query

def jsutstring(df, x, y, z):
        return df.query(' and '.join(s + v + str(t) for t, s, v in zip(x, y, z)))
    
out = jsutstring(df, [10,10], ['A','C'], ['>', '>'])

answered Sep 13 '20 at 01:46

BENY

317,841
20
164
234

Interesting way to use `query`. Good to learn. Wish I could accept both answers but thanks for the elegant solution! – Rock Sep 13 '20 at 01:52

Subset dataframe with dynamic conditions using various number of columns as arguments

3 Answers3