2

Say I have the string

"((attr1=25 and attr2=8) or attr3=15)"

or

"((attr1>25 and attr2<50) or (attr3=10 and attr4=20))"

or

"(attrXYZ=10)"

or even

"(attr1=20 and attr2=20 and attr3=20 and attr4=20)"

And a list that contains dictionarys, where each dictionary may or may not have the specified attribute in the string. Is there an easy way in Python to filter the dictionaries that match this type of string query?

johnsmith101
  • 179
  • 1
  • 2
  • 11
  • What is generating these strings? – juanpa.arrivillaga Dec 07 '16 at 05:11
  • @juanpa.arrivillaga they are inputted by the user, and don't necessarily have to fit that format. It could be something like `"(attr1=25)"`, or even something like `"((attr1>25 and attr2<50) or (attr3=10 and attr4=20))"` – johnsmith101 Dec 07 '16 at 05:14
  • @juanpa.arrivillaga I just dont know if there is some built in way to do this (new to python), or if I will have to resort to string manipulation – johnsmith101 Dec 07 '16 at 05:15
  • It sounds like you have a specific grammar which can be used to construct filter expressions. Look into using lexical parser like `pyparsing` or `ply`. Here is a quite related post: http://stackoverflow.com/questions/2371436/evaluating-a-mathematical-expression-in-a-string. – alecxe Dec 07 '16 at 05:19
  • `eval()` should not be our choice though - it is dangerous especially in your case where you cannot really trust the source of the filter string (here is a working example - but do not use it in practice https://gist.github.com/alecxe/ee798e21d961c687f040aa0a1e72fa64). – alecxe Dec 07 '16 at 05:21
  • @alecxe Would `eval()` be safe to use if we can assume that the filter string will be formatted correctly, similarly to my examples? The attributes may or may not exist in each dictionary, but I'm assuming `filter` takes care of that – johnsmith101 Dec 07 '16 at 05:25
  • @johnsmith101 the point is, eval is never safe. You can make things better by carefully validating and sanitizing the input filter string though. – alecxe Dec 07 '16 at 14:28

2 Answers2

1

Disclaimer: This is a very lazy and insecure solution that uses two of the most inglorious functions of Python, eval and exec, but can work if the output has exactly the same shape as the one that you provided.

Our strategy is to edit the inputs to look similar to the syntax that Python understands naturally, instead of creating our own parser. Doing that, we will use the dis module (Disassembler for Python bytecode) to get all names in the string.

import dis 

class Number:
    def __init__(self, n, exists=True):
        self.n = n
        self.exists = exists

    def __lt__(self, other):
        return self.n < other if self.exists else False

    def __le__(self, other):
        return self.n <= other if self.exists else False

    def __eq__(self, other):
        return self.n == other if self.exists else False

    def __ne__(self, other):
        return self.n != other if self.exists else False

    def __gt__(self, other):
        return self.n > other if self.exists else False

    def __ge__(self, other):
        return self.n >= other if self.exists else False


def clear_entries(entry):
    entry_output = entry.replace('!=', '<>').replace('=','==').replace('<>','!=')
    return entry_output

def check_condition(dict_, str_):
    str_ = clear_entries(str_)

    for k, v in dict_.items():
        exec("{0} = {1}".format(k, v))

    all_names = dis.Bytecode(str_).codeobj.co_names
    l_ = locals()
    non_defined_names = [v for v in all_names if v not in l_]

    for name in non_defined_names:
        exec("{0} = Number(0, exists=False)".format(name))  # the number value does not matter here (because of the 'exists' flag)

    if eval(str_):
        return True

    return False

Testing

if __name__ == '__main__':
    entries = [
        "((attr1=25 and attr2=8) or attr3=15)",
        "((attr1>25 and attr2<50) or (attr3=10 and attr4=20))",
        "(2<attrXYZ<10)",
        "(attr1=20 and attr2=20 and attr3=20 and attr4=20)",
        "(attr1=20 or (attr2=20 and attr3=20 and attr4=20 and attr1231231=1))"
    ]

    dicts = [
        {'attr1': 25, 'attr2': 8, 'attr3': 123},
        {'attr1': 1, 'attr2': 8, 'attr3': 123},
        {'attr1': 26, 'attr2': 8, 'attr3': 123, 'attr4': 1},
        {'attr1': 1, 'attr2': 50, 'attr3': 1, 'attr4': 20},
        {'attr1': -1, 'attr2': 50, 'attr3': 1, 'attr4': 20},
        {'attrXYZ': 3},
        {'attrXYZ': 10},
        {'attr1': 20}

    ]

    for entry in entries:
        for d in dicts:
            print(check_condition(d, entry), '"{0}"'.format(entry), d)

Result

(True, '"((attr1=25 and attr2=8) or attr3=15)"', {'attr1': 25, 'attr2': 8, 'attr3': 123})
(False, '"((attr1=25 and attr2=8) or attr3=15)"', {'attr1': 1, 'attr2': 8, 'attr3': 123})
(False, '"((attr1=25 and attr2=8) or attr3=15)"', {'attr1': 26, 'attr2': 8, 'attr3': 123, 'attr4': 1})
(False, '"((attr1=25 and attr2=8) or attr3=15)"', {'attr1': 1, 'attr2': 50, 'attr3': 1, 'attr4': 20})
(False, '"((attr1=25 and attr2=8) or attr3=15)"', {'attr1': -1, 'attr2': 50, 'attr3': 1, 'attr4': 20})
(False, '"((attr1=25 and attr2=8) or attr3=15)"', {'attrXYZ': 3})
(False, '"((attr1=25 and attr2=8) or attr3=15)"', {'attrXYZ': 10})
(False, '"((attr1=25 and attr2=8) or attr3=15)"', {'attr1': 20})
(False, '"((attr1>25 and attr2<50) or (attr3=10 and attr4=20))"', {'attr1': 25, 'attr2': 8, 'attr3': 123})
(False, '"((attr1>25 and attr2<50) or (attr3=10 and attr4=20))"', {'attr1': 1, 'attr2': 8, 'attr3': 123})
(True, '"((attr1>25 and attr2<50) or (attr3=10 and attr4=20))"', {'attr1': 26, 'attr2': 8, 'attr3': 123, 'attr4': 1})
(False, '"((attr1>25 and attr2<50) or (attr3=10 and attr4=20))"', {'attr1': 1, 'attr2': 50, 'attr3': 1, 'attr4': 20})
(False, '"((attr1>25 and attr2<50) or (attr3=10 and attr4=20))"', {'attr1': -1, 'attr2': 50, 'attr3': 1, 'attr4': 20})
(False, '"((attr1>25 and attr2<50) or (attr3=10 and attr4=20))"', {'attrXYZ': 3})
(False, '"((attr1>25 and attr2<50) or (attr3=10 and attr4=20))"', {'attrXYZ': 10})
(False, '"((attr1>25 and attr2<50) or (attr3=10 and attr4=20))"', {'attr1': 20})
(False, '"(2<attrXYZ<10)"', {'attr1': 25, 'attr2': 8, 'attr3': 123})
(False, '"(2<attrXYZ<10)"', {'attr1': 1, 'attr2': 8, 'attr3': 123})
(False, '"(2<attrXYZ<10)"', {'attr1': 26, 'attr2': 8, 'attr3': 123, 'attr4': 1})
(False, '"(2<attrXYZ<10)"', {'attr1': 1, 'attr2': 50, 'attr3': 1, 'attr4': 20})
(False, '"(2<attrXYZ<10)"', {'attr1': -1, 'attr2': 50, 'attr3': 1, 'attr4': 20})
(True, '"(2<attrXYZ<10)"', {'attrXYZ': 3})
(False, '"(2<attrXYZ<10)"', {'attrXYZ': 10})
(False, '"(2<attrXYZ<10)"', {'attr1': 20})
(False, '"(attr1=20 and attr2=20 and attr3=20 and attr4=20)"', {'attr1': 25, 'attr2': 8, 'attr3': 123})
(False, '"(attr1=20 and attr2=20 and attr3=20 and attr4=20)"', {'attr1': 1, 'attr2': 8, 'attr3': 123})
(False, '"(attr1=20 and attr2=20 and attr3=20 and attr4=20)"', {'attr1': 26, 'attr2': 8, 'attr3': 123, 'attr4': 1})
(False, '"(attr1=20 and attr2=20 and attr3=20 and attr4=20)"', {'attr1': 1, 'attr2': 50, 'attr3': 1, 'attr4': 20})
(False, '"(attr1=20 and attr2=20 and attr3=20 and attr4=20)"', {'attr1': -1, 'attr2': 50, 'attr3': 1, 'attr4': 20})
(False, '"(attr1=20 and attr2=20 and attr3=20 and attr4=20)"', {'attrXYZ': 3})
(False, '"(attr1=20 and attr2=20 and attr3=20 and attr4=20)"', {'attrXYZ': 10})
(False, '"(attr1=20 and attr2=20 and attr3=20 and attr4=20)"', {'attr1': 20})
(False, '"(attr1=20 or (attr2=20 and attr3=20 and attr4=20 and attr1231231=1))"', {'attr1': 25, 'attr2': 8, 'attr3': 123})
(False, '"(attr1=20 or (attr2=20 and attr3=20 and attr4=20 and attr1231231=1))"', {'attr1': 1, 'attr2': 8, 'attr3': 123})
(False, '"(attr1=20 or (attr2=20 and attr3=20 and attr4=20 and attr1231231=1))"', {'attr1': 26, 'attr2': 8, 'attr3': 123, 'attr4': 1})
(False, '"(attr1=20 or (attr2=20 and attr3=20 and attr4=20 and attr1231231=1))"', {'attr1': 1, 'attr2': 50, 'attr3': 1, 'attr4': 20})
(False, '"(attr1=20 or (attr2=20 and attr3=20 and attr4=20 and attr1231231=1))"', {'attr1': -1, 'attr2': 50, 'attr3': 1, 'attr4': 20})
(False, '"(attr1=20 or (attr2=20 and attr3=20 and attr4=20 and attr1231231=1))"', {'attrXYZ': 3})
(False, '"(attr1=20 or (attr2=20 and attr3=20 and attr4=20 and attr1231231=1))"', {'attrXYZ': 10})
(True, '"(attr1=20 or (attr2=20 and attr3=20 and attr4=20 and attr1231231=1))"', {'attr1': 20})
GustavoIP
  • 873
  • 2
  • 8
  • 25
  • would this method cover operators such as < or >? – johnsmith101 Dec 07 '16 at 07:47
  • Yes! I tested for "(2 – GustavoIP Dec 07 '16 at 07:53
  • Hi gustavo. Thank you for the solution - although this seems to be failing for queries such as `"(attr1=20 or attr2=25)"`, if the current dictionary doesnt have the `attr1` key. This is because when it cant find `attr1`, NameError is thrown, and we never get to check and see if `attr2=25`. How would you go about solving this? – johnsmith101 Dec 07 '16 at 13:26
1

ONLY DO THIS IF YOU'RE SURE YOUR QUERY STRING IS SAFE.

(EDIT: You really should use something like pyparsing instead of doing something quick and dirty.)

Do not use exec on the query string if the source is from untrusted input.

import re

QUERY_EXEC_RE = re.compile('(\w+)=')

def _matches(query_exec, d):
    a = []
    exec('a.append({0})'.format(query_exec), globals(), locals())
    return a[0]

def query_dicts(query, dicts):
    query_exec = QUERY_EXEC_RE.sub(r'd.get("\1") == ', query)
    return [d for d in dicts if _matches(query_exec, d)]

Example:

query = "((attr1=25 and attr2=8) or attr3=15)"
dicts = [
    dict(attr1=1, attr2=2, attr3=3),
    dict(attr1=25, attr2=7, attr3=12),
    dict(attr1=24, attr2=8, attr3=13),
    dict(attr1=25, attr2=8, attr3=14),
    dict(attr1=5, attr2=1, attr3=15),
    dict(attr3=15),
    dict(attr1=25, attr2=8),
]
answer = query_dicts(query, dicts)
print(answer)

[{'attr1': 25, 'attr2': 8, 'attr3': 14},
 {'attr1': 5, 'attr2': 1, 'attr3': 15},
 {'attr3': 15},
 {'attr1': 25, 'attr2': 8}]
mVChr
  • 49,587
  • 11
  • 107
  • 104