9

I need a tree structure that supports "and" and "or"ing. For example, given a regular expression like ab|c(d|e) I want to turn that into a tree.

So, at first we have two "or" branches... it can either go down ab, or c(d|e). If you head down the ab branch, you get two nodes, a and b (or a followed by b, whatever). Then if you go down the c(d|e) branch, you get c and (d|e), then (d|e) is split into d or e.

Making a tree structure is easy, you just have something like

class Node {
    string element;
    Node[] children;
}

But then how do you know if the children should be "anded" or "ored"? I guess each level of the tree should alternate between "anding" and "oring"

Does that make sense? Can anyone suggest a structure for this?


A few people have suggested storing the "operator" on the node, which is fine, but isn't there a way to take advantage of the fact that each level always alternates or,and,or,and,...?

Edit: Not quite sure why people keep assuming this is a binary tree. It's not. I was hoping the tiny code snippet would tip you off. The example just happens to have only 2 branches.


Currently leaning towards this:

abstract class Node { }

class DataNode : Node
{
    string data;
}

abstract class OpNode : Node
{
    Node[] children;
}

class OrNode : OpNode { }
class AndNode : OpNode { }
mpen
  • 272,448
  • 266
  • 850
  • 1,236
  • I think the alternating AND / OR levels is just a coincidence for this example. In general, you should be able to mix AND and OR on the same level. See my answer. – mbeckish Dec 07 '10 at 21:30
  • @mbeckish: Is it? Can you provide an example where you *wouldn't* alternate? – mpen Dec 07 '10 at 21:33
  • Also, you don't need an array of children (unless you are going to support N-ary operators). – mbeckish Dec 07 '10 at 21:34
  • @Ralph: a|b|c is one trivial example. – mbeckish Dec 07 '10 at 21:35
  • @mbeckish: No... that's just one layer with 3 branches. No one said this was a binary tree. That's why I used an array in my example. – mpen Dec 07 '10 at 21:36
  • @Ralph - If you want to collapse multiple binary operators into a single N-ary node, then yes, you might be able to assume alternating layers. – mbeckish Dec 07 '10 at 21:40
  • @Ralph - Regarding your Edit, I think many people would model this as a binary tree because the operators are binary (or unary if you support NOT)! – mbeckish Dec 07 '10 at 21:43
  • @mbeckish: The first sentence reads "given the regular expression". This hints that I'm using regex operators, not any other binary kind. – mpen Dec 07 '10 at 21:48
  • @Ralph - Didn't parse that little bit of text. Plus, the title says "and-or tree". Maybe you can change it to make it more obvious that you want to parse general regular expressions into a tree (which I think would add a lot more complication). – mbeckish Dec 07 '10 at 21:56
  • @Ralph - Plus, "ab" is not "a AND b" in a regular expression - it is "a followed by b". Now I'm really confused about what you're asking for. – mbeckish Dec 07 '10 at 21:57
  • @mbeckish: Didn't want to put *too* much emphasize on that... I've already managed to parse the RE without using any trees... well, a 2-level tree + recursion, but I think I need a new structure to add some more features. I only need support for the 2 operators (I think). – mpen Dec 07 '10 at 21:58
  • 1
    how you make (a|b|c)(de) with every level of tree and or or only ? –  Dec 07 '10 at 22:07
  • @HPT: You put `d`, `e`, and an "or" node all on the same row (and-d together), then `a`,`b`,`c` on the next row (or'd). It works, but you've hinted at another problem that we've lost the grouping information around `(de)` (it's as though the brackets aren't there... which is perfectly fine until you have to do back-references) – mpen Dec 07 '10 at 22:20
  • @Ralph: Regarding your last comment: now you're just playing around with the input rather than parsing it. If you want to reduce the initial string before parsing it, that's one option, but it's a bad one. PS - What class is this for? – Nick Vaccaro Dec 07 '10 at 22:29
  • @Norla: What do you mean? I was going to parse the string into a tree, and then evaluate it. Why is that a bad approach? That's how parsers are written, apparently. This is all for http://stackoverflow.com/questions/4298541 – mpen Dec 07 '10 at 22:39

7 Answers7

13

Think of a tree structure where every node represents a boolean expression that can be evaluated to be either true or false - in your case a regular expression (match or non-match). The tree structure itself represents AND and OR: Each route, starting at the root node and ending with a node that has no further children, is a conjunction of expressions, representing AND. The tree

    A
   /
  B
 /
C

would represent A AND B AND C.

Whenever a node has more than 1 child node, there is an OR (disjunction), branching into several routes:

    A
   / \
  B   D
 /
C

represents A AND ((B AND C) OR D)

So you do not even need to store the operators anywhere.

In your example you have the expression ab|c(d|e) so there is no common root expression to evaluate; I suggest the root in this case is simply true and the tree would look like:

   true
   / \
  A   C
 /   / \
B   D   E

For a custom tree class in c# look here Tree data structure in C# or search on or make one of your own.

Community
  • 1
  • 1
Wintermute
  • 394
  • 4
  • 19
  • I like this!! It removes the need to keep track of the alternation or operators, and its clean and easy to read. – mpen Feb 05 '12 at 18:10
5
abstract class Node { }

class DataNode : Node {
    public string Data { get; }

    // details
}

class OperatorNode : Node {
    public Node Left { get; }
    public Node Right { get; }
    public BinaryOperator Operator { get; }

    // details
}

abstract class BinaryOperator { // details }

class Or : BinaryOperator { // details }
class And : BinaryOperator { // details }
jason
  • 236,483
  • 35
  • 423
  • 525
  • 1
    So, how would the example ab|c(d|e) be implemented? – mbeckish Dec 07 '10 at 21:24
  • Also note that it's not a binary tree, but that's an easy fix. And where is the "data" stored? Only the leaves have data... just put it in the node and leave it as null unless its a leaf? – mpen Dec 07 '10 at 21:28
5

You could have 2 types of nodes: operator nodes and variable nodes.

The leaves of your tree would all be variable nodes; all other nodes would be operator nodes.

Binary operator nodes would have two children. Unary operator (like NOT) nodes would have 1 child.

For your example ab|c(d|e):

      OR
  /         \
 AND       AND
 / \      /  \
a   b    c   OR
           /  \
          d    e
mbeckish
  • 10,485
  • 5
  • 30
  • 55
  • Programmatically this seems straight forward to implement. Parsing would involve starting and the left most leaf and traversing nodes. – TSG Dec 29 '15 at 16:26
4

Is there anything wrong with this:

enum Operation
{
    None,
    And,
    Or
}

class Node {
    string element;
    Node[] children;
    Operation operation;
}

Edit:

As an example of how ab|c(d|e) would look something like this:

Node root = new Node
        {
            operation = Operation.Or,
            children = new Node[]
            {
                new Node
                {
                    operation = Operation.And,
                    children = new Node[]
                    {
                          new Node{ element = "a" },
                          new Node{ element="b" }
                    }
                },
                new Node
                {
                    children = new Node[]
                    {
                        new Node{ element = "c"},
                        new Node
                        {
                            operation= Operation.Or,
                            children = new Node[]
                            {
                                new Node{ element= "d"},
                                new Node{element = "e"}
                            }
                        }
                    }
                }
            }
        };
Mark Avenius
  • 13,679
  • 6
  • 42
  • 50
  • How would the example ab|c(d|e) be implemented? – mbeckish Dec 07 '10 at 21:32
  • @mbeckish: I like your approach better though, separating my one `Node` type into two types (and each `new Node` would be replaced with `new OperatorNode` or `new ElementNode` – Mark Avenius Dec 07 '10 at 22:36
2

I did this just a few days ago using ANTLR. ANTLR provided me with a grammar which is represented as an AST Abstract Syntax Tree as you just described and it generated c# code that could handle that grammar.

It's quite nice and elegant. Here are a few example.

phillip
  • 2,618
  • 19
  • 22
1

Just to throw in a slightly different one

interface Node
{
    // top level operations here
}

class OpNode : Node
{
    public Node Left { get; set; }
    public Node Right { get; set; }
}

class AndNode : OpNode
{
    public AndNode(Node left, Node right)
    {
        Left = left;
        Right = right;
    }
    public override string ToString()
    {
        return "(" + Left.ToString() + " & " + Right.ToString() + ")";
    }
}

class OrNode : OpNode
{
    public OrNode(Node left, Node right)
    {
        Left = left;
        Right = right;
    }
    public override string ToString()
    {
        return "(" + Left.ToString() + " | " + Right.ToString() + ")";
    }
}

class DataNode<T> : Node
{
    T _data;
    public DataNode(T data)
    {
        _data = data;
    }
    public override string ToString()
    {
        return _data.ToString();
    }
}
Jaime
  • 6,736
  • 1
  • 26
  • 42
  • Sounds like mbeckish's solution, but you took it one step further and split the op nodes in two different types as well. – mpen Dec 07 '10 at 21:40
1

How about something this simple:

class OrNode {
  string element;
  AndNode[] children;
} 

class AndNode {
  string element;
  OrNode[] children;
} 

Each class could have its own evaluate() which would AND or OR all the children as needed

You may still want to have a parent superclass so that your code could hold generic nodes without worrying about whether the first one was AND or OR.

AShelly
  • 34,686
  • 15
  • 91
  • 152
  • This is the only solution so far that forces alternation...and it's simple. I like it. – mpen Dec 07 '10 at 21:52