4

I have a dictionary containing several pandas masks as strings for a specific dataframe, but I can't find a way to use those masks.

Here is a short reproducible example :

df = pd.DataFrame({'age' : [10, 24, 35, 67], 'strength' : [0 , 3, 9, 4]})

masks = {'old_strong' : "(df['age'] >18) & (df['strength'] >5)",
        'young_weak' : "(df['age'] <18) & (df['strength'] <5)"}

And I would like to do something like :

df[masks['young_weak']]

But since the mask is a string I get the error

KeyError: "(df['age'] <18) & (df['strength] <5)"
smci
  • 32,567
  • 20
  • 113
  • 146
vlemaistre
  • 3,301
  • 13
  • 30

4 Answers4

6

Use DataFrame.query with changed dictionary:

masks = {'old_strong' : "(age >18) & (strength >5)",
        'young_weak' : "(age <18) & (strength <5)"}

print (df.query(masks['young_weak']))
   age  strength
0   10         0
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • Nice one ! But do you know if there is a way to do it without changing the dict, or by changing the dict to transform the strings into masks ? – vlemaistre Jun 05 '19 at 08:37
  • @vlemaistre - Unfortunately here is necessary change dictionary. – jezrael Jun 05 '19 at 08:38
  • Thanks for your answer @jezrael, I'll accept the one with eval() (even though it's less clean that your solution) because it was more of what I was looking for. But this helped me too :) – vlemaistre Jun 05 '19 at 08:45
  • @vlemaistre - yes, it is up to you - [Why is using 'eval' a bad practice?](https://stackoverflow.com/questions/1832940/why-is-using-eval-a-bad-practice) – jezrael Jun 05 '19 at 08:46
  • 3
    Just for reference.... `.query` is just using `pd.eval` under the hood anyway according to the [docs](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.query.html)... – Chris Adams Jun 05 '19 at 08:46
  • @ChrisA - ya, but there are some check, because if change `eval` to `pd.eval` it failed – jezrael Jun 05 '19 at 08:47
  • @jezrael Your answer is much better that mine, but since his requirements only can be fulfilled with `eval`, i used it, but your answer is very good, you deserve the vote score you have now – U13-Forward Jun 05 '19 at 08:51
  • 2
    @U9-Forward - yop, maybe you can change your answer like it is posible, but strongly recomended dont do it. ;) – jezrael Jun 05 '19 at 08:53
  • 1
    @jezrael Added a little more to it – U13-Forward Jun 05 '19 at 08:55
1

Another way is to set up the masks as functions (lambda expressions) instead of strings. This works:

masks = {'old_strong' : lambda row: (row['age'] >18) & (row['strength'] >5),
    'young_weak' :  lambda row: (row['age'] <18) & (row['strength'] <5)}
df[masks['young_weak']]
Itamar Mushkin
  • 2,803
  • 2
  • 16
  • 32
1

If you're allowed to change the masks dictionary, the easiest way is to store filters and not strings like this:

masks = {
   'old_strong' : (df['age'] >18) & (df['strength'] >5),
   'young_weak' : (df['age'] <18) & (df['strength'] <5)
}

Otherwise, keep the strings and use df.query(masks['yound_weak']).

0

Unsafe solution though, and very bad practice, but the only way to solve it is to use eval:

print(df[eval(masks['young_weak'])])

Output:

   age  strength
0   10         0

Here is the link to the reason it's bad.

U13-Forward
  • 69,221
  • 14
  • 89
  • 114