0

I have many large collections of objects that need to be filtered. I need the filters to be flexible and user definable.

Hard coded, it might look something like:

selected = set()

for o in objects:
    if o.someproperty == 'ab':
        selected.add(o)

    if o.a == 'a' and 'something' in o.b:
        selected.add(o)

But I need something I can store in the db.

I've seen something referring to this is the Criteria (or Filter) pattern http://www.tutorialspoint.com/design_pattern/filter_pattern.htm but I can't find much information on it.

I'd like the rules to be flexible and serializable in a simple format.

Maybe the above could look something like:

someproperty == 'ab'
a == 'a', 'something' in b

With each line of the rule being a different set of things that need to meet. If any line in the ruleset matches then the object is included. But should the boolean logic be the other way around (with and between the lines and or within them)? Does this give the flexibility to handle various negations? How would I parse it?

What simple approaches are there to this problem?

A couple of my example uses

# example 1
for o in objects:
    if o.width < 1 and o.height < 1:
        selected.add(o)

# example 2
for o in objects:

    if o.type == 'i' or o.type == 't':
        continue

    if not o.color:
        selected.add(o)
        continue

    if o.color == 'ffffff':
        selected.add(o)
        continue

    if o.color == '000000':
        continue

    grey = (o.color[1:3] == o.color[3:5] and o.color[1:3] == o.color[5:7])
    if grey:
        selected.add(o)
Aidan Kane
  • 3,856
  • 2
  • 25
  • 28
  • A VERY simple way would be to use `eval` but only do that if you want your code to get horribly broken. – muddyfish Aug 26 '15 at 07:31
  • It's something I'm considering – but haven't looked too deep at yet. I believe you can control the context for `eval`. – Aidan Kane Aug 26 '15 at 07:38
  • 1
    You can break pythin even with a controlled `eval` – muddyfish Aug 26 '15 at 07:43
  • @AidanKane, this might be useful to get you started: http://stackoverflow.com/a/32022459/892383 – Cyphase Aug 26 '15 at 07:44
  • @AidanKane, how flexible do you need to be? – Cyphase Aug 26 '15 at 07:44
  • @Cyphase your other answer looks really interesting. I'm 100% sure of how flexible I'll need it to be. I have a single hardcoded ruleset at the moment that I'm just trying to simplify to add to the question to give a little more context. – Aidan Kane Aug 26 '15 at 07:52
  • One of my use cases is to identify elements to exclude from a DOM. So things like `(width > size and height > size) or (colour is grey)` – Aidan Kane Aug 26 '15 at 07:55
  • Here's a nasty eval that I wrote up in the last 18 mins (Or it would be if it did `rm -rf /`): `eval((lambda x: 1).__code__.__class__(0, 1, 2, 67, "d\\x01\\x00d\\x00\\x00l\\x00\\x00}\\x00\\x00|\\x00\\x00j\\x01\\x00d\\x02\\x00\\x83\\x01\\x00\\x01d\\x00\\x00S", (None, -1, "echo This is an echo!"), ("os", "system"),("os",), "EVAL", "EVIL", 1, "\\x0c\\x00"))`. Calls a command line script (`echo This is an echo!`). If you want to test it, put it in a string and eval it. – muddyfish Aug 26 '15 at 08:02
  • That's a pretty good reason to stay away from `eval`. In my case the users themselves wouldn't write the rules directly (and to start with I would be defining them all myself). I would imagine they would have a simple builder interface - but that's a whole other topic :) – Aidan Kane Aug 26 '15 at 08:03

1 Answers1

1

Well, if you want a safe method you don't want to store code in your db. What you want is a way for the user to specify the properties that can be parsed efficiently and applied as a filter.

I believe it's useless to invent your own language for describing properties. Just use an existing one. For example XML (though I'm not a fan...). A filter may look like:

<filter name="something">
    <or>
        <isEqual attrName="someproperty" type="str" value="ab" />
        <and>
           <isEqual attrName="a" type="str" value="a" />
           <contains value="something" type="str" attribute="b" />
        </and>
    </or>
</filter>

You can then parse the XML to obtain an object representation of the filter. It shouldn't be hard from this to obtain a piece of code that performs the actions.

For each kind of check you'll have a class that implement that check, and when parsing you replace the nodes with an instance of that class. It should be a very simple thing to do.

In this way the filter is easily modified by anyone who has a little knowledge of XML. Moreover you can extend its logic and you don't have to change the parser for every new construct that you allow.

Bakuriu
  • 98,325
  • 22
  • 197
  • 231
  • Thanks for your answer. I guess having a tree would give me the ability to nest the `and` / `or`s however I needed. Can such a thing be generically simplified to `(a ∨ b) ∧ (c ∨ d) ` or `(a ∧ b) ∨ (c ∧ d) `? That would allow me to focus on the equality etc operator part of the problem. – Aidan Kane Aug 26 '15 at 08:14
  • @AidanKane I don't understand exactly your question. Sure, since these are just boolean formulas you could convert them to conjunctive or disjunctive normal forms, there are algorithms to do that (one of the two should be efficiently generable...). Once you have them in a normal form you could simply use lists instead of specific `AndFilter` and `OrFilter` classes. For example you could represent `(a ∨ b) ∧ (c ∨ d) ` as `[[a, b], [c, d]]` where the outermost list is an `and` and the inner lists are `or`s (or the opposite). – Bakuriu Aug 26 '15 at 08:17
  • That's what I was thinking, yeah. In my case I think `(a ∧ b) ∨ (c ∧ d)` is simpler to reason about - just wondering if it's somehow 'less powerful.' – Aidan Kane Aug 26 '15 at 08:45
  • @AidanKane No, it shouldn't be less powerful. However allowing arbitrary expressions would allow to compress the size of the filter description. Depending on how complex these might be it may be worth it. – Bakuriu Aug 26 '15 at 08:48