-2

Suppose I have a dataframe:

df = pd.DataFrame(np.random.randint(0, 100, size=(100, 7)), columns=list('ABCDEFG'))

that I need to pull a subset of df based on a series of conditions on selected columns but the columns are to be tuned. With said, the conditions are 'dynamic' that give various subsets, such as (to make the scenario easy, let's use all &):

dfs = df[(df.A > 10) & (df.C > 5) & (df.E < 30)]
dfs = df[(df.C > 5) & (df.D < 0)]
dfs = df[(df.B > 20) & (df.E < 30) & (df.F > 7) & (df.G < 15)]
...

Question is, is it possible to write a function that takes the columns and their compared values as some kind of **kwargs that returns different subset with different column list? Many thanks!

Please note sub-setting is not the main issue here. It's how we pass different combinations of different columns and their corresponding compared values as arguments.

Rock
  • 2,827
  • 8
  • 35
  • 47
  • This could help https://stackoverflow.com/a/46165056/6692898, but maybe include some pseudocode of your expected function – RichieV Sep 13 '20 at 01:25
  • @RichieV thanks for the link. sub-setting is not the main issue here. but how do we pass different combinations of different columns and compared values within the queries as arguments? – Rock Sep 13 '20 at 01:29

3 Answers3

1

One way to accomplish this would be to pass a tuple containing the comparator and the target value, as such:

>>> def apply_filters(df, **kwargs):
...   for col in kwargs.keys():
...     df = df.loc[kwargs[col][0](df[col], kwargs[col][1])]
...   return df

>>> apply_filters(df, A=(pd.Series.__lt__, 10), C=(pd.Series.__lt__, 10))
    A   B  C   D   E   F   G
52  0  22  2  35  81  56  10
54  2  83  5   7  91  38  79
ApplePie
  • 8,814
  • 5
  • 39
  • 60
1

IIUC, you can do something along those lines

def foo(df, *args):
    r=True
    for (i,arg) in enumerate(args):
        r = r & df[arg[0]].isin(arg[1])
    return df[r]

example:

foo(df, ['A',[51,83]],['B',[38,89]],['C',[45,27]]) 

will yield (for my random values)

    A   B   C   D   E   F   G
0  51  38  45  80  64   7  17
1  51  89  27  86  22  82  70

you can improve foo with this reference

adhg
  • 10,437
  • 12
  • 58
  • 94
1

May just do query

def jsutstring(df, x, y, z):
        return df.query(' and '.join(s + v + str(t) for t, s, v in zip(x, y, z)))
    
out = jsutstring(df, [10,10], ['A','C'], ['>', '>'])
BENY
  • 317,841
  • 20
  • 164
  • 234
  • Interesting way to use `query`. Good to learn. Wish I could accept both answers but thanks for the elegant solution! – Rock Sep 13 '20 at 01:52