8

I am looking for an algorithm to detect redundant rules.

Rules have a fixed number of input parameters and each parameter has a distinct domain.

Consider three rule parameters Color, Material and Size:

  • Color: Red, Green, Blue
  • Material: Wood, Glass, Aluminium
  • Size: Small, Medium, Large

Each rule can match on multiple values of a parameter or match on any value. The first rule that matches all parameters values is selected. There are no negation rules, but the domain is fixed, so a negation can be achieved by added all others.

       +--------------------------------------------------++-----------------
       |                   Rule Parameters                || Rule Action
       +----------------+------------------+--------------++-----------------
       | Color          | Material         | Size         || ==> Price
       +----------------+------------------+--------------++-----------------
Rule 1 | Red            | Wood, Glass      | Large        || $100
Rule 2 | Red            | Aluminium        | Large        || $200
Rule 3 | Blue or Green  | Wood             | Medium       || $300
Rule 4 | Red            | ** Any **        | Large        || $400  <== Redundant
Rule 5 | ** Any **      | ** Any **        | ** Any **    || $500

Rule 4 is redundant due to a combination of Rule 1 and 2. A redundant rule is a rule that will never be matched due to (a combination) of rules defined before that rule. The rule action is not evaluated in the redundancy check.

How to implement this efficiently (in Java)? It should support 1000 rules with 10 parameters with 100 values each. The rules, parameters and parameter values are read from a database (ie. they cannot be hard-coded).

The issue with efficiency is that there are 100^10 combinations of input parameters (each parameter has a domain of 100 values). The algorithm is needed in the rule editor gui, to support the user that is creating the rules. It should find all redundant rules within seconds.

GITHUB

I've created a repository to test proposed solutions: https://github.com/qlp/redundant-rules Currently only a BDD implementation, which is failing with a problem of this size. Maybe my BDD implementation can be improved?

Jurriaan
  • 89
  • 6
  • 3
    I've voted to close as too broad. This is merely a statement of requirements, not a decent SO question. You should really attempt a solution and come back if you encounter specific problems. – Duncan Jones Jun 23 '14 at 14:48
  • The problem is that I do not know an efficient algorithm. What should I try to do? Implement a brute-force solution to see that it has performance / memory issues? – Jurriaan Jun 23 '14 at 15:11
  • I would implement a Java application that creates the combination of test cases for your parameters. 10 parameters with 100 values is 10 trillion test cases. Then, run the test cases against the rule engine. In your small example, rule 4 would never be triggered, so rule 4 is redundant. – Gilbert Le Blanc Jun 23 '14 at 15:51
  • 1
    The question is indeed rather broad and lacks some details, but I think it is specific and clear enough to encourage answers with interesting potential approaches - thus, +1 – Marco13 Jun 23 '14 at 15:58
  • Why not retrieving all rows from DB and brute-force checking them? You would have O(n)= n^2 = 1000^2. – V G Jun 24 '14 at 10:27
  • Brute-force checking, means checking every combination of parameter values: 100^10 combinations. Currently we cannot do that within seconds. – Jurriaan Jun 24 '14 at 11:27
  • Based on the answers so far, I doubt that there is a general and efficient solution. Most answers aim at boolean term minimization and solvers, but these tend to involve 2^n terms for n variables, and/or require exponential time. One approach that could be "useful" based on the description so far could be to limit the implication checks to a certain number of rules. It is tremendously hard (i.e. practically impossible) to check whether rule 1000 is implied by any combination of the first 999 rules. But it would be feasible to check whether it is implied by, say, *any 3* of the first 999 rules – Marco13 Jun 25 '14 at 15:19
  • Vote-to-closers: It is clear the question is well formed, and that it is getting good answers. Why are you voting to close? – Ira Baxter Jun 25 '14 at 22:00
  • Not exactly what your asking for, but it might be better/easier to simply look at efficient ways to store and/or apply the rules (if either of those are an issue) rather than trying to remove redundancy. The reason being that given the number of rules VS the number of possible combinations suggests the level of redundancy in the rules is likely to be rather low anyway unless you have a rather good reason to believe otherwise. – Nuclearman Jun 26 '14 at 06:54
  • @Nuclearman Reason for detecting redundant rules is to inform the user (not performance). The user that adds a redundant rule is making a mistake. – Jurriaan Jun 26 '14 at 08:06
  • Ironically, while trying to justify why my point remains, I came up with what is probably an efficient solution to the problem. Making my point moot (only valid in the case of exponential growth). See the answer for details. – Nuclearman Jun 29 '14 at 09:12

7 Answers7

6

Essentially, you're tasked with implementing a decision tree. Each level of the tree corresponds to each parameter, and at each level, there will be a node for each value that the parameter could hold. At the leaf level, you would store the rule action that was applicable.

For example, at the Color level, you'd have 3 nodes, Red, Blue, Green. Each of those would have 3 children (on the Material level) Wood, Glass, Aluminum. Each of those would have 3 children (Small, Medium, Large). You would construct this tree based on learning the parameters and values from the database.

Once the tree is constructed, you'd read the rules from the database, one at a time, and store the 'rule action' at each of the leaves (in this case, at the Size level). If there was a rule that wanted to apply at a leaf that already had an action, then there is an ambiguity about which rule would apply. For your problem you stated that you'd take the first rule that applied - so you would not change the rule stored.

If there was a rule, where for each of the leaves it had an action for there already was a defined action, then that rule would be 'redundant' according to your example.

  • The question of whether one might not only want to detect *redundancies* but also *inconsistencies* came to my mind as well, and this would probably be easy with a tree-based approach. However, I'm not sure whether building the tree might not lead to a scalability issue: For the numbers mentioned in the question, you could end up with a tree with 100^10 leaf nodes... but maybe there could some tricky pruning be applied for the "ANY"-rules... – Marco13 Jun 23 '14 at 16:02
  • I would classify this as a 'brute force' solution; How to handle 100^10 nodes? – Jurriaan Jun 23 '14 at 16:59
  • What do you mean with *inconsistencies*, when the first-matching rule wins? – Jurriaan Jun 23 '14 at 17:01
  • You could have rules "Red,Wood,Large->100" and "Red,Wood,Large->200" - not exactly redundant, but inconsistent. – Marco13 Jun 23 '14 at 23:16
  • The second 'Red,Wood,Large' rule would be redundant; it will never match any input (I've added my definition of redundant rules to the question). – Jurriaan Jun 24 '14 at 05:10
  • 1
    @Jurriaan: If you insist the the rules have an *order*, then you can resolve the inconsistencies by declaring that first matching rule wins. If the rules don't have an order (not obvious from your example), then you just have inconsistencies with no resolution (or maybe you invent another one, e.g., the most specific rule wins, if there is one). A real problem with 1000 rules like this, is there *will* be (conceptual) inconsistencies, and people will end up "debugging" them somehow. IMHO, better you should *dianose* the inconsistencies to help them do that debugging. (PS: +1). – Ira Baxter Jun 25 '14 at 10:41
  • @Jurriaan: Slight revision to above: you were clear about ordering. My remark about debugging stands. – Ira Baxter Jun 25 '14 at 10:47
  • @IraBaxter Supporting users in debugging is a concern, but I would like to focus the discussion on the redundancy-checking challenge. – Jurriaan Jun 25 '14 at 12:56
  • The fact that OP has disjunctive formulas, and Not( – Ira Baxter Jun 25 '14 at 15:02
  • "How to handle 100^10 nodes?" - How big your problem domain is? How many parameters do you have, and how many values are there? – gaborsch Jun 29 '14 at 18:22
1

EDIT Completely rewritten due to the comments. (Note that this might actually be similar to what Khaled A Khunaifer wrote, but it has been created at the same time)

One possible approach could be to find a "logical" description of the rules. The rules have a quite regular form, namely always a conjunction of disjunctions. The rules can be written as

(Red) AND (Wood OR Glass) AND (Large)
(Red) AND (Aluminium) AND (Large)
...

Now, one could apply sophisticated optimizations and minimizations (like Quine McCluskey or any other form of minimization). However, I sketched a very simple implementation here. Rules can be "merged" when they only differ in one of the disjunctions. For example, the rules above could be merged to

(Red) AND (Wood OR Glass OR Aluminium) AND (Large)

Now, if a rule like

(Red) AND (Wood OR Glass OR Aluminium) AND (Medium)

was encountered, it could be merged with the previous one to

(Red) AND (Wood OR Glass OR Aluminium) AND (Large OR Medium)

If we then found a rule like

(Red) AND (Glass OR Aluminium) AND (Medium)

it could be marked as "redundant", because it is implied by the previous one.

To emphasize this again: This implementation is rather "hacky" and far from being "elegant", and there are certainly many possible improvements. But maybe it shows that the general idea is feasible, at least.

import java.util.ArrayList;
import java.util.Arrays;
import java.util.Collection;
import java.util.LinkedHashSet;
import java.util.List;
import java.util.Set;

public class RulesChecker
{
    public static void main(String[] args)
    {
        List<Rule> rules = new ArrayList<Rule>();
        List<Set<String>> domains = new ArrayList<Set<String>>();
        domains.add(setOf("Red", "Green", "Blue"));
        domains.add(setOf("Wood", "Glass", "Aluminium"));
        domains.add(setOf("Small", "Medium", "Large"));

//      Rule 1 | Red            | Wood, Glass      | Large        || $100
//      Rule 2 | Red            | Aluminium        | Large        || $200
//      Rule 3 | Blue or Green  | Wood             | Medium       || $300
//      Rule 4 | Red            | ** Any **        | Large        || $400  <== Redundant
//      Rule 5 | ** Any **      | ** Any **        | ** Any **    || $500

        rules.add(new Rule("$100", disOf("Red")          , disOf("Wood", "Glass"), disOf("Large")));
        rules.add(new Rule("$200", disOf("Red")          , disOf("Aluminium")    , disOf("Large")));
        rules.add(new Rule("$300", disOf("Blue", "Green"), disOf("Wood")         , disOf("Medium")));
        rules.add(new Rule("$310", disOf("Blue")         , disOf("Wood")         , disOf("Medium")));
        rules.add(new Rule("$350", disOf("Green")        , disOf("Aluminium")    , disOf("Medium")));
        rules.add(new Rule("$400", disOf("Red")          , disOf(domains.get(1)) , disOf("Large")));
        rules.add(new Rule("$500", disOf(domains.get(0)) , disOf(domains.get(1)) , disOf(domains.get(2))));

        System.out.println("Rules: ");
        for (Rule rule : rules)
        {
            System.out.println(rule);
        }


        List<Rule> mergedRules = new ArrayList<Rule>();
        mergedRules.add(rules.get(0));
        for (int i=1; i<rules.size(); i++)
        {
            add(mergedRules, rules.get(i));
        }
    }


    private static void add(List<Rule> mergedRules, Rule newRule)
    {
        for (int i=0; i<mergedRules.size(); i++)
        {
            Rule oldRule = mergedRules.get(i);
            if (implies(oldRule, newRule))
            {
                System.out.println("Redundant "+newRule);
                System.out.println("   due to "+oldRule);
                return;
            }
            int mergeIndex = mergeIndex(oldRule, newRule);
            if (mergeIndex != -1)
            {
                Rule mergedRule = merge(oldRule, newRule, mergeIndex);
                mergedRules.set(i, mergedRule);
                System.out.println("Merging "+oldRule);
                System.out.println("    and "+newRule);
                System.out.println("  gives "+mergedRule);
                return;
            }
        }
        mergedRules.add(newRule);
    }

    private static boolean implies(Rule oldRule, Rule newRule)
    {
        Conjunction c0 = oldRule.conjunction;
        Conjunction c1 = newRule.conjunction;
        List<Expression> es0 = new ArrayList<Expression>(c0.terms);
        List<Expression> es1 = new ArrayList<Expression>(c1.terms);
        for (int i=0; i<es0.size(); i++)
        {
            Disjunction d0 = (Disjunction) es0.get(i);
            Disjunction d1 = (Disjunction) es1.get(i);
            if (!d0.terms.containsAll(d1.terms))
            {
                return false;
            }
        }
        return true;
    }


    private static Disjunction disOf(String ... ss)
    {
        return disOf(Arrays.asList(ss));
    }

    private static Disjunction disOf(Collection<String> ss)
    {
        List<Variable> list = new ArrayList<Variable>();
        for (String s : ss)
        {
            list.add(new Variable(s));
        }
        return new Disjunction(list);
    }

    private static int mergeIndex(Rule r0, Rule r1)
    {
        Conjunction c0 = r0.conjunction;
        Conjunction c1 = r1.conjunction;
        List<Expression> es0 = new ArrayList<Expression>(c0.terms);
        List<Expression> es1 = new ArrayList<Expression>(c1.terms);
        int different = 0;
        int mergeIndex = -1;
        for (int i=0; i<es0.size(); i++)
        {
            Expression e0 = es0.get(i);
            Expression e1 = es1.get(i);
            if (!e0.equals(e1))
            {
                mergeIndex = i;
                different++;
                if (different > 1)
                {
                    return -1;
                }
            }
        }
        return mergeIndex;
    }

    private static Rule merge(Rule r0, Rule r1, int index)
    {
        Conjunction c0 = r0.conjunction;
        Conjunction c1 = r1.conjunction;
        List<Expression> es0 = new ArrayList<Expression>(c0.terms);
        List<Expression> es1 = new ArrayList<Expression>(c1.terms);

        Set<Disjunction> rc = new LinkedHashSet<Disjunction>();
        for (int i=0; i<es0.size(); i++)
        {
            Disjunction d0 = (Disjunction) es0.get(i);
            Disjunction d1 = (Disjunction) es1.get(i);
            if (i == index)
            {
                Set<Expression> merged = new LinkedHashSet<Expression>();
                merged.addAll(d0.terms);
                merged.addAll(d1.terms);
                Disjunction d = new Disjunction(merged);
                rc.add(d);
            }
            else
            {
                rc.add(d0);
            }
        }
        return new Rule("TRUE", new Conjunction(rc));
    }

    private static Set<String> setOf(String ... s)
    {
        return new LinkedHashSet<String>(Arrays.asList(s));
    }

    static class Expression
    {
    }

    static class Variable extends Expression
    {
        final String name;

        Variable(String name)
        {
            this.name = name;
        }

        @Override
        public String toString()
        {
            return name;
        }

        @Override
        public int hashCode()
        {
            final int prime = 31;
            int result = 1;
            result = prime * result + ((name == null) ? 0 : name.hashCode());
            return result;
        }

        @Override
        public boolean equals(Object obj)
        {
            if (this == obj)
                return true;
            if (obj == null)
                return false;
            if (getClass() != obj.getClass())
                return false;
            Variable other = (Variable) obj;
            if (name == null)
            {
                if (other.name != null)
                    return false;
            }
            else if (!name.equals(other.name))
                return false;
            return true;
        }

    }

    static class Disjunction extends Expression
    {
        private final Set<Expression> terms;
        Disjunction(Collection<? extends Expression> expressions)
        {
            this.terms = new LinkedHashSet<Expression>(expressions);
        }
        @Override
        public String toString()
        {
            StringBuilder sb = new StringBuilder();
            sb.append("(");
            int n = 0;
            for (Expression e : terms)
            {
                sb.append(e);
                n++;
                if (n < terms.size())
                {
                    sb.append(" + ");
                }
            }
            sb.append(")");
            return sb.toString();
        }
        @Override
        public int hashCode()
        {
            final int prime = 31;
            int result = 1;
            result = prime * result + ((terms == null) ? 0 : terms.hashCode());
            return result;
        }
        @Override
        public boolean equals(Object obj)
        {
            if (this == obj)
                return true;
            if (obj == null)
                return false;
            if (getClass() != obj.getClass())
                return false;
            Disjunction other = (Disjunction) obj;
            if (terms == null)
            {
                if (other.terms != null)
                    return false;
            }
            else if (!terms.equals(other.terms))
                return false;
            return true;
        }
    }

    static class Conjunction
    {
        private final Set<Expression> terms;
        Conjunction(Collection<? extends Expression> expressions)
        {
            this.terms = new LinkedHashSet<Expression>(expressions);
        }
        @Override
        public String toString()
        {
            StringBuilder sb = new StringBuilder();
            sb.append("(");
            int n = 0;
            for (Expression e : terms)
            {
                sb.append(e);
                n++;
                if (n < terms.size())
                {
                    sb.append(" * ");
                }
            }
            sb.append(")");
            return sb.toString();
        }
        @Override
        public int hashCode()
        {
            final int prime = 31;
            int result = 1;
            result = prime * result + ((terms == null) ? 0 : terms.hashCode());
            return result;
        }
        @Override
        public boolean equals(Object obj)
        {
            if (this == obj)
                return true;
            if (obj == null)
                return false;
            if (getClass() != obj.getClass())
                return false;
            Conjunction other = (Conjunction) obj;
            if (terms == null)
            {
                if (other.terms != null)
                    return false;
            }
            else if (!terms.equals(other.terms))
                return false;
            return true;
        }
    }

    private static class Rule
    {
        Conjunction conjunction;
        String action;

        @SafeVarargs
        Rule(String action, Disjunction ... disjunctions)
        {
            this.action = action;
            this.conjunction = new Conjunction(Arrays.asList(disjunctions));
        }

        Rule(String action, Conjunction conjunction)
        {
            this.action = action;
            this.conjunction = conjunction;
        }

        @Override
        public String toString()
        {
            return conjunction+" -> "+action;
        }
    }
}
Marco13
  • 53,703
  • 9
  • 80
  • 159
  • This will not work, because you cannot isolate the domains of the input parameters. Eg. 'Green' in 'Rule 3' will only match *in combination with* wood and medium. – Jurriaan Jun 23 '14 at 17:04
  • As mentioned in the comments: Some more details may be helpful. So one has to assume that each and every of the 100^10 *possible* rules might or might not be present, and there are 1000 rules specified that cover the whole domain - is this correct? – Marco13 Jun 23 '14 at 23:17
  • No sure what you mean with _cover the whole domain_? There are 100^10 possible inputs (added this to the description). The rules do not need to cover each combination. There is always a default action when no rule matches. – Jurriaan Jun 24 '14 at 05:17
  • I've added `rules.add(new Rule("$350", setOf("Green"), setOf("Aluminium"), setOf("Medium")));` between rule 3 & 4. This rule is not redundant, but is marked as redundant by your implementation. – Jurriaan Jun 24 '14 at 07:34
  • I think 'applying sophisticated optimizations and minimizations' as you mentioned is the only way forward to solve this efficiently? – Jurriaan Jun 24 '14 at 11:24
  • The proposed method is already applying a minimization. This could be extended and improved, but nobody will implement a boolean expression minimization framework for you so that you can pick the solution that best suits your needs. You did not specify what you mean by "efficient", and you did not provide adequate test cases. Then I have to agree that your question is too broad, and consider adding the last missing vote to let it be closed.... – Marco13 Jun 24 '14 at 12:54
  • I'm asking for an 'algorithm' that can solve this problem within seconds (on reasonable hardware). Current answers show variations of brute-force approaches. I've set the parameters in such a way that brute force is not feasible. An answer like 'this is not possible' or 'this is extremely complex' is also fine. I'm not asking for anyone to implement a minimization framework, but it might already exist? Maybe there is another solution? I don't know and that why i'm asking. – Jurriaan Jun 24 '14 at 13:21
  • 1
    The problem of minimizing a boolean expression is NP hard, so this for itself and in general is no solution - and I think this might be a hint that there **is** no efficient solution to your problem at all. But since I can hardly imagine editing 1000 rules with 10 parameters and 100 values *manually* in a GUI, one has to assume that in reality, the problem is much more constrained than you told until now. For example: Could it really be the case that you have 1000 *different* rules of the form `(99 values, 99 values, ... (10 times)..., 99 values)`? If so, I guess you're out of luck... – Marco13 Jun 24 '14 at 16:32
  • You are right; this example is a bit exaggerated because I am looking for an algorithm that doesn't use (an optimized version of) brute force. In reality our users are able to define their own number of parameters and also configure the domain of each parameter. The domain values can be very large, e.g. a country-code, currency, product,... whatever. _Plan-B_ is to implement a brute-force algorithm and (when the number of combinations explode) warn the user when redundancy in rule definitions is not (fully) calculated. – Jurriaan Jun 24 '14 at 18:00
  • Further "discussion" should not be done here. The latter at least sounds like a possible way to go. Until now, it sounds as if the number of values that the user will specify **per rule** and *per parameter* could be low, and many combinations are not covered explicitly, but by the "ANY" wildcard. Maybe one can derive a tricky approach that takes this fact into account (name that only few possible parameter values are actually *used explicitly*), but this would require re-thinking the problem statement and doing some research e.g. (a guess:) about (boolean) minimizations. – Marco13 Jun 24 '14 at 18:46
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/56227/discussion-between-jurriaan-and-marco13). – Jurriaan Jun 24 '14 at 18:59
1

One possibility is to use binary decision diagrams. These allow efficient manipulation of Boolean formulas. As mentioned by others, you can think of a rule as a binary formula: Rule 1 is (r OR (w AND g) OR l) and Rule 2 is (r AND a AND l), and to decide whether Rule 4 is redundant, we need to check whether (Rule4 AND (NOT (Rule1 OR Rule2 OR Rule3))) has a solution. Additionally, clauses like (NOT (w and g)) need to be used to disallow invalid inputs. There are many solvers available, and while there is no guarantee that running time or memory usage don't blow up, they typically work very well for real-world inputs.

Falk Hüffner
  • 4,942
  • 19
  • 25
  • I'll look into BDD, it's really new for me. Any solver-suggestions for Java? How will this work? I do not know what you mean with _whether ... has a solution_. – Jurriaan Jun 25 '14 at 13:16
  • I haven't actually tried any solvers for Java. "Has a solution" means that there is an assignment to the Boolean variables that makes the formula evaluate to true. – Falk Hüffner Jun 26 '14 at 20:35
  • Please have a look at my BDD implementation on [github](https://github.com/qlp/redundant-rules/blob/master/src/main/java/com/darrep/redundantrules/detector/bdd/BddDetector.java). Disclaimer: This is the first time i'm using BDD... – Jurriaan Jun 28 '14 at 08:32
1

Each of your rules represents a boolean condition. It isn't clear how expressive your rule notation is (can I say, "not Wood"?), but Rule 1 appears to be

   Color==Red and (Material==Wood or Material==Glass) and Size==Large

You also likely have some "domain constraints", C(k), presumably of the form:

   Color==Red ==>  Not(Color==Green)

where ==> is the logical implication operator.

What you appear to want to know, roughly, is if for some i and j, with R(i) meaning "ith Rule":

   R(i) ==> R(j)

In practice, you need to know if:

   R(i) and C(1) and C(2) and ... C(K) ==> R(j)

You can (and must) solve this by essentially doing boolean algebra. This is actually relatively straightforward to do, e.g., treating the WHOLE equation (including the implication operator) as an actual symbolic formula and applying laws of boolean algebra.

Let's work your example algebraically (longish)

To start, for your example system of rules, the domain constraints C(i) are:

  Color==Red ==> Not(Color == Green)
  Color==Red ==> Not(Color == Blue)
  Color==Blue ==> Not(Color == Green)
  Color==Blue ==> Not(Color == Red)
  Color==Green ==> Not(Color == Red)
  Color==Green ==> Not(Color == Blue)
  (Color==Red or Color==Blue or Color===Green) <==> true   [if only these 3 colors exist]

and similarly for Materials (over Wood, Glass, Aluminum) and Size (Small, Medium, Large).

For your specific example, R(1) is:

  Color==Red and (Material==Wood or Material==Glass) and Size==Large

and R(4) is:

  Color==Red and (true) and Size==Large

What you want to know is if:

  R(1) and <domainconstraints> ==> R(4)

That's a pretty huge formula, but that's really only a problem for the computer. In this particular case, we don't need the domain constraints (I, the oracle, speaketh) so I'm just going to leave it out to get rid of the hugeness:

  R(1) and true ==> R(4)

or just

  R(1) ==> R(4)

(Implementation hint: you can always try R(i) ==> R(j) first before trying R(i) and ==> R(j), because the first implies the second. This can be used to keep the size of the terms down if a redundancy is detected).

The "a ==> b" operator is boolean equivalent to " ~a or b" so you want to know if this formulat is true:

  Not(R(1)) or R(4) 

Plugging in the definitions of R(1) and R(4):

  Not(Color==Red and (Material==Wood or Material==Glass) and Size==Large) or (Color==Red and (true) and Size==Large)

Using DeMorgan's law and simplifying out "and (true)":

 Not(Color==Red) or Not( (Material==Wood or Material==Glass) ) or Not( Size==Large)  or Color==Red and Size==Large

Apply DeMorgan's law a second time (we don't realy need this because Redness is going to drive the answer but we're not supposed to know that yet):

 Not(Color==Red) or Not(Material==Wood) and Not(Material==Glass) or Not(Size==Large)  or Color==Red and Size==Large

A fact:

 a and b ==> a

so, for any b,

 a and b or a == a

Using this with a being Not(Color==Red) and b being Size==Large:

 Not(Color==Red) and Size==Large or Not(Color==Red) or or Not(Material==Wood) and Not(Material==Glass) or Not(Size==Large)  or Color==Red and Size==Large

Now we group like terms:

 Not(Color==Red) and Size==Large or Color==Red and Size==Large or Not(Color==Red) or or Not(Material==Wood) and Not(Material==Glass) or Not(Size==Large)  

Combining terms having Size==Large:

 ( Not(Color==Red) or Color==Red) and Size==Large Not(Color==Red) or Not(Material==Wood) and Not(Material==Glass) or Not(Size==Large)  

using a or ~ a == true:

 ( true ) and Size==Large or Not(Color==Red) or Not(Material==Wood) and Not(Material==Glass) or Not(Size==Large)

simplifying and grouping Size terms:

 (Size==Large or Not(Size==Large)) or Not(Color==Red) or Not(Material==Wood) and Not(Material==Glass) 

giving (whew)

(true) or ...

which produces true, so R(1) ==> R(4) and thus R(4) is redundant.

Implementation (or TL;DR)

Obviously you don't want to do this by hand. You can encode the formulas as boolean expressions (you've already done something like this to be able to store them in your system). What you want to know is if this statement as a whole is a tautology; for that, you could use Wang's rules of inference. That's a little clumsy to implement but doable.

Binary Decision Diagrams (awful name) are a way to encode "truth tables" for a formula. What you want is to decide if the truth table is always true for a formula. As another poster noted, you can get BDD packages. If you build the BDD for this implication equation, and it is "TRUE" everywhere (that is, you get a trivial BDD), then the implication is true and you have a redundancy.

The running time of checking this might get to be expensive. You can either put up with that expense, or simply put an upper bound on how much CPU you will give each attempt; either it produces an answer "no conflict", "conflict", or "timeout", with "timeout" being interpreted as "no conflict". You can then make this run as fast as you like, modulo sometimes not eliminating redundant rules.

Since your rules are independent declarations, if you have N rules, you'll need to do this test approximately N^2 times. (The good news, is that if you have already built up a trusted database of N rules, adding a new rule will only require N tests).

Ira Baxter
  • 93,541
  • 22
  • 172
  • 341
  • It is not possible negate, but since the domain of values is fixed the user can select the complete domain except 'wood'. I'll look into Wang and BDD. _Plan B_ is to limit the amount of CPU as you mentioned, but i'll focus first on finding a way to prevent the timeout scenario. – Jurriaan Jun 25 '14 at 13:09
  • So you in effect have "Not(Material==Wood)", etc. OK. The answer works regardless of whether you have general not, specific not (your case), or, no not at all. – Ira Baxter Jun 25 '14 at 14:00
  • I'll look into BDD, it's really new for me. Any BDD suggestion for Java? (also asked this @FalkHüffner) – Jurriaan Jun 25 '14 at 14:58
  • Please have a look at my BDD implementation on [github](https://github.com/qlp/redundant-rules/blob/master/src/main/java/com/darrep/redundantrules/detector/bdd/BddDetector.java). Disclaimer: This is the first time i'm using BDD... – Jurriaan Jun 28 '14 at 08:32
  • I did. It has the right idea in that you area building up a bdd for each rule. A confusing topic is that I don't know how you represent your individual formulas in "ValueList"; given your example in your question, I'd expect the equivalent of "Conjunctive Normal Form" e.g. a sequence of "OR" terms. The implementation looks wrong to me from several points of view: 1) your BDD assembly for a rule should then build up a list of individual OR terms, and then compose those with ANDs; you seem to build a bunch of ORs but I can't see a list of them. ... – Ira Baxter Jun 28 '14 at 09:21
  • ... 2) You OR *all* your rules together; you want one BDD *per* rule, IMHO. 3) Your redundancy comparison is "all rules together equal this rule"; what you want is "does any rule imply this rule"? You can implement "a ==> b" using the boolean equivalent of "~ a or b". – Ira Baxter Jun 28 '14 at 09:25
  • ... 4) Your domain constraints are needed IMHO but they aren't anywhere in sight. Your biggest problem is going to be understanding what a BDD you built *really* represents as a formula; it is worth your time to implement a "BDD-to-equivalent-rule" printer; the BDD package should have a way to tear a BDD back down so you can inspect the pieces and print what they mean. I don't know this package. As your very first attempt, its pretty good. Now you get to learn how to really use it right. – Ira Baxter Jun 28 '14 at 09:28
  • I've added a by-hand worked example. The BDD version won't look like that but will give the same effect. – Ira Baxter Jun 28 '14 at 10:15
  • *(My BDD implementation) is failing with a problem of this size* A big part of your trouble is combining *all* the rules into a giant result. Just build BDDs for individual rules. (That still may not work if the rules plus domain constraints are complex enough but its the first thing to try). And a cheap shot is, if the term is "too big", you can always declare that the target rule is not redundant :-} – Ira Baxter Jun 28 '14 at 10:25
  • Puzzled... (and not understanding it completely, yet). _which produces true, so R(1) ==> R(4) and thus R(4) is redundant_; R(4) is redundant due to the existence of R(1) and R(2). Without R(1), R(4) would not be redundant, because Red/Wood/Glass results in $400. – Jurriaan Jun 28 '14 at 12:35
  • About the implementation; You're right; "all rules **before this rule** together equal this rule", that's because I want to know whether there is a rule that has no effect on the outcome ==> when i add a rule to the combined-bdd and the combine-bdd does not change, that rule is redundant. I need some more time to process... – Jurriaan Jun 28 '14 at 12:44
  • A) R(4) is redundant if R(1) is present; you don't need R(2) to demonstrate this. B) Yes, you can try to combine all the conditions, but if that runs you out of space, maybe you shouldn't do that. I've suggested checking rule pairs might not give you this problem already. – Ira Baxter Jun 28 '14 at 17:17
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/56463/discussion-between-jurriaan-and-ira-baxter). – Jurriaan Jun 28 '14 at 20:08
0

First, let's take a look at your example, and instead of **ANY** we put all options

       +---------------------------------++--------------
       |         Rule Parameters         || Rule Action
       +----------+-----------+----------++--------------
       | Color    | Material  | Size     || Price
       +----------+-----------+----------++--------------
Rule 1 | R        | W,G       | L        || $100
Rule 2 | R        | A         | L        || $200
Rule 3 | B,G      | W         | M        || $300
Rule 4 | R        | A,W,G     | L        || $400
Rule 5 | R,B,G    | A,W,G     | S,M,L    || $500

Next, we need to define how a Rule can be redundant, so here's my definition

A rule is redundant if it is equal to the merge of any k rules other than itself.

So, based on this definition (with the note that 1 <= k < n where n is number of rules) we can see that finding all redundant rules would take alot of time.

So, if we have n = 100 rules in our rule set, this brute-force will take around n^3 * n! which is Time ~ 100^3 * (1! + 2! + .. + 100!) = 10^6 * 9.4 * 10^158 = 9.333 * 10^164.

Here's my version of brute-force checking which takes O(n^3 * SUM { k! | k = 1..n } ):

boolean isRedundant (RuleList R, Rule d)
{
    // R is the set of rules
    // T is a copy of the set, without the rule to be checked
    // d is the rule to be checked

    T = new RuleList(R);
    T.remove(d);

    for (int i=0; i<T.size(); i++) // check single rules
        if (T.get(j).equal(d))
            return true;

    for (int k=1; k<R.size(); k++) // 1 <= k < n
    {
        // merge every k-permutation to check
        for (RuleList permutation : T.getPermutations(k))
        {
            Rule r = new Rule ();

            for (Rule p : permutation)
                r.merge(p);

            if (r.equal(d))
                return true;
        }
    }

    return false;
}

Class Rule

class Rule
{
    public boolean [3] color; // R,G,B
    public boolean [3] material; // A,W,G
    public boolean [3] size; // S,M,L

    public Rule ()
    {
        color = { false, false, false };
        material = { false, false, false };
        size = { false, false, false };
    }

    public void merge (Rule r)
    {
        color[0] = and(color[0], r.color[0]);
        color[1] = and(color[1], r.color[1]);
        color[2] = and(color[2], r.color[2]);

        material[0] = and(material[0], r.material[0]);
        material[1] = and(material[1], r.material[1]);
        material[2] = and(material[2], r.material[2]);

        size[0] = and(size[0], r.size[0]);
        size[1] = and(size[1], r.size[1]);
        size[2] = and(size[2], r.size[2]);
    }

    public boolean equal (Rule r)
    {
        return (color[0]==r.color[0]
                  && color[1]==r.color[1]
                  && color[2]==r.color[2]
                  && material[0]==r.material[0]
                  && material[1]==r.material[1]
                  && material[2]==r.material[2]
                  && size[0]==r.size[0]
                  && size[1]==r.size[1]
                  && size[2]==r.size[2]);
    }

    public boolean and(boolean a, boolean b)
    {
        return (a && b);
    }
}

Alternative Solution

Let's say that we have sets that represent each parameter, then we can reduce the number of rules to be checked from total number of rules, also this can optimize time complexity since we will see all members as one from that perspective, then go through a simple matching algorithm.

       +---------------------------------++--------------
       |         Rule Parameters         || Rule Action
       +----------+-----------+----------++--------------
       | Color    | Material  | Size     || Price
       +----------+-----------+----------++--------------
Rule 1 | R        | W,G       | L        || $100
Rule 2 | R        | A         | L        || $200
Rule 3 | B,G      | W         | M        || $300
Rule 4 | R        | A,W,G     | L        || $400
Rule 5 | R,B,G    | A,W,G     | S,M,L    || $500

We have the following:

Color_R = { 1, 2, 4, 5 }
Color_B = { 3, 5 }
Color_G = { 3, 5 }

Material_A = { 2, 4, 5 }
Material_W = { 1, 3, 4, 5 }
Material_G = { 1, 4, 5 }

Size_S = { 5 }
Size_M = { 3, 5 }
Size_L = { 1, 2, 4, 5 }

Now, for every rule, we take its parameter sets, then match back without looking at the same rule ..

Example:

Rule_4 = { Color_R, Material_A, Material_W, Material_G, Size_L }

Color_R    = { 1, 2, x, 5 }
Material_A = { 2, x, 5 }
Material_W = { 1, 3, x, 5 }
Material_G = { 1, x, 5 }
Size_L     = { 1, 2, x, 5 }

We check for set of rules that match all of the memberships:

Matched = { 5, 1+2 }

Then we compare them with the selected rule, by set-difference looking for phi:

Rule_5 - Rule_4 = { Color_B, Color_G, Size_S, Size_M }

> Rule_5 does not match

(Rule_1 + Rule2) - Rule_4 = { }

> (Rule_1 + Rule2) match

Finally, we conclude:

(Rule_1 + Rule2) = Rule_4

> Rule_4 is redundant
Khaled.K
  • 5,828
  • 1
  • 33
  • 51
  • What is the internal datastructure of a Rule? How will the `merge` implementation merge rule 2 and 3? – Jurriaan Jun 24 '14 at 08:38
  • @Jurriaan code added. `isRedundant` code is fixed, we need to check all k-permutations. – Khaled.K Jun 24 '14 at 10:03
  • Interesting one (conceptually similar to my updated answer). The permutations will make this approach infeasible even for "relatively small" values of `n`. But don't you think that the fact that the rules have to be checked *in order* (and the "first matching" rule matters) can be exploited here? Somehow that only *previous* rules have to be checked? (Only a rough idea, but maybe worth considering...) – Marco13 Jun 24 '14 at 10:04
  • @Marco13 at first, I thought it was `n^3`, but turns out that if we have `n` rules, we need to check all k-permutations for `k < n`, which cost `O(n^3 n!)` .. he asked for the implementation of Rule, so I simplified it to binary-states and logic-gates to avoid having to do parse – Khaled.K Jun 24 '14 at 10:12
  • The order of the rules that imply another rule is not important in general, so I assume that by "permutations" you actually mean "elements of the power set" ( http://en.wikipedia.org/wiki/Power_set - this is even larger than `n!`). I still think that some optimizations may be possible here, but would have to think about this more thoroughly... – Marco13 Jun 24 '14 at 10:16
  • @Marco13 yes, I think it's `n! + (n-1)! + (n-2)! + .. + 2! + 1!` – Khaled.K Jun 24 '14 at 10:24
  • 1
    Hm. Not sure. But if you have rules A,B,C,D, and you want to check whether D is redundant, you have to check whether D is implied by A, or by B, or by C, or by A+B, or by A+C, ... (and so on). But after you have checked whether it is implied by A+B, you do *not* have to check whether it is implied by B+A (the order does not matter). But maybe I misunderstood something here. – Marco13 Jun 24 '14 at 10:53
  • Anyway, i'm looking for a _smarter_ way to solve this, like 'applying sophisticated optimizations and minimizations' as @Marco13 mentioned. – Jurriaan Jun 24 '14 at 11:22
  • 1
    The issue is that there are two ways to approach these kinds of problems. Start with the empty set and add try to find redundant rules or start with the set of all rules and try to determine which ones are required then verify all remaining are actually redundant. The first way has exponential growth, the second doesn't because you can quickly check if when the rules are combined they are equal to the one being considered. – Nuclearman Jun 29 '14 at 23:17
  • 1
    @KhaledAKhunaifer: Alternative solution looks good and more-less matches what I used in my answer. Although, how you eliminated rule 3 as a possibility could be clearer. I don't think you can get exponential growth, but you might get incorrect solutions depending on how you handle the step in the algorithm. Rule 3 should be removed (at least in my opinion and answer) because the set contains elements that are not in the rule considered. That is ((rule 3) - (rule 4)) is not the empty set. – Nuclearman Jun 30 '14 at 09:18
0

There is extensive work in the field of pattern matching within strongly typed functional languages. The question of what it actually is has even been talked about here, on stackoverflow.

The publication page of Luc Maranget contains many articles on the topic in various settings, some of which served as a basis for the OCaml pattern matching algorithm.

His paper on compiling pattern matching to good decision trees could be a definitive source of inspiration for your problem. It also provides references for previous articles on the same topic. The basic principle is to describe patten matching sets as integer matrices, along with algebraic operations.

I also highly recommend trying , or any implementation, to get a feel of that central feature of the ML language family. It might be also interesting for you to check out , which implements similar functionalities.

Community
  • 1
  • 1
didierc
  • 14,572
  • 3
  • 32
  • 52
0

The number of rules in this problem is quite large, but the number of values is limited. Thats why I think a rule by rule evaluation will not give the quickest results. If the rules are grouped by value, multiple rules can be evaluated together. The algorithm I came up with can still explode, but I hope is less likely to do so.

Sample data:

Rule #: [Color (R,G,B),Material (W,G,A),Size (S,M,L)]
Rule 1: [[1,0,0],[1,1,0],[0,1,0]]
Rule 2: [[1,0,0],[0,0,1],[0,1,0]]
Rule 3: [[0,1,1],[1,0,0],[0,0,1]]
Rule 4: [[1,0,0],[1,1,1],[0,1,0]]
Rule 5: [[1,1,1],[1,1,1],[1,1,1]]

(Copied from Nuclearman)

Step 1: Split rules by value of parameter 1

Group the rules in a Set by a value for parameter 1. Probably a heuristic is needed to pick the best first parameter. For now I just pick them in order.

Result:
Rules in Set R: 1,2,4,5
Rules in Set G: 3,5
Rules in Set B: 3,5

Step 2: Merge Equal Sets

Set G en B contain the same rules at this level, so can be joined.

Result:
Rules in Set R: 1,2,4,5
Rules in Set GB: 3,5

Step 3: Split groups by value of parameter 2

For all the set the rule are now check on parameter 2. For each different value the set is split into subsets. The sets of step 2 are no longer relevant.

Result:
Rules in Set (R ,W): 1,4,5
Rules in Set (R ,G): 1,4,5
Rules in Set (R ,A): 2,4,5
Rules in Set (GB,W): 3,5
Rules in Set (GB,G): 5
Rules in Set (GB,A): 5

Step 4: Remove single rules

Rule 5 is now proven to be relevant. It singly catches (G,G,*), (B,G,*), (G,A,*) and (B,A,*). Set (GB,G) and (GB,A) can be remove from the evaluation, as they have nothing to prove anymore.

Step 5: Merge Equal Sets

Result:
Rules in Set (R ,WG): 1,4,5
Rules in Set (R ,A ): 2,4,5
Rules in Set (GB,W ): 3,5

Step 3: Split groups by value for parameter 3

Result:
Rules in Set (R ,WG, S): 5
Rules in Set (R ,WG, M): 1,4,5
Rules in Set (R ,WG, L): 5
Rules in Set (R ,A ,S): 5
Rules in Set (R ,A ,M): 2,4,5
Rules in Set (R ,A ,L): 5
Rules in Set (GB,W ,S): 5
Rules in Set (GB,W ,M):5
Rules in Set (GB,W ,L): 3,5

Rule 5 is clearly proven to be useful, no counting rule sequence. 1, 2, 3 are proven, because there are the highest priority in there Set. Rule 4 is never prove and is redundant.

Now in pseudo-code:

    void getProven(RuleList rulelist, RuleList provenlist, ParameterIterator i)
    {
        ValuesToRulelistMap subsets = new ValuesToRulelistMap(); 
        Parameter parameter = i.next();
        for (Rule rule : rulelist) {
            Valuelist valuelist = rule.getValuesByParameter(parameter);
            for (Value value : valueslist) {
                subsets.addToListInMap(value, rule);
            }
            KeyList doneKeys = new KeyList();
            for (RuleList subset : subsets.getRuleLists()) {
                if (subset.length == 0) {
                    // No rule for this case!!!
                } else 
                if (subset.length == 1) {
                    // Singly proven
                    provenlist.add(subset.getFirst());
                } else
                if (!i.hasNext()) {
                    // Proven by priority
                    provenlist.add(subset.getFirst());
                } else
                    Key key = subset.combineRulesInListToKey()
                    if (!doneKeys.containts(key)) {
                        getProven(subset, provenlist, i);
                        doneKeys.add(key);
                    }
                }
            }
        }
    }
Eelco
  • 69
  • 4
  • I used that approach as a stepping stone to the algorithm I posted. The issue being that with 10 parameters and 100 values a parameter, it's rather likely to explode. Your also going to get a lot less benefit from the "merge equal parts" than you might think. The odds of having the exact same set of rules out of a 1000 rules is extremely low. Thus this approach is likely to be O(V^P) where V is the average number of values (100) and P is the number of parameters (10). I got around it by testing against a specific rule, which allows you to ignore rules that aren't a subset of that rule. – Nuclearman Jun 30 '14 at 03:30
  • I had hoped that the subset will be small (in case of a lot of different single values) or the sets will merge a lot (in case of a lot of Any-elements). But "linear cost per rule" sounds even better. Nuclearman, any change you can do a full pseudo-code of your algorithm? – Eelco Jun 30 '14 at 09:29
  • Sadly my Java is rather rusty and I can't seem to be able to come up with a Java-ish pseudocode that I was happy with. Thus I expanded my Pythonic pseudocode to actual Python solution instead. I do say linear in the number of rules, but it's actually more accurate to say linear in the number of elements. IE: O(R * P * V), where R is the number of rules (expected 1000), P is the number of parameters (expected 10) and V is the number of values per rule (expected 100). Although there are a few optimizations in implementation that can improve the performance somewhat. – Nuclearman Jun 30 '14 at 10:16
  • That's good. Though it should be values per parameter rather than values per rule, but the end result is the same. – Nuclearman Jun 30 '14 at 22:47