How to parse a boolean expression and load it into a class?

Question

I've got the following BoolExpr class:

class BoolExpr
{
    public enum BOP { LEAF, AND, OR, NOT };

    //
    //  inner state
    //

    private BOP    _op;
    private BoolExpr _left;
    private BoolExpr _right;
    private String   _lit;

    //
    //  private constructor
    //

    private BoolExpr(BOP op, BoolExpr left, BoolExpr right)
    {
        _op = op;
        _left  = left;
        _right = right;
        _lit = null;
    }

    private BoolExpr(String literal)
    {
        _op = BOP.LEAF;
        _left  = null;
        _right = null;
        _lit = literal;
    }

    //
    //  accessor
    //

    public BOP Op
    {
        get { return _op;  }
        set { _op = value; }
    }

    public BoolExpr Left
    {
        get { return _left;  }
        set { _left = value; }
    }

    public BoolExpr Right
    {
        get { return _right;  }
        set { _right = value; }
    }

    public String Lit
    {
        get { return _lit; }
        set { _lit = value; }
    }

    //
    //  public factory
    //

    public static BoolExpr CreateAnd(BoolExpr left, BoolExpr right)
    {
        return new BoolExpr(BOP.AND, left, right);
    }

    public static BoolExpr CreateNot(BoolExpr child)
    {
        return new BoolExpr(BOP.NOT, child, null);
    }

    public static BoolExpr CreateOr(BoolExpr left, BoolExpr right)
    {
        return new BoolExpr(BOP.OR, left, right);
    }

    public static BoolExpr CreateBoolVar(String str)
    {
        return new BoolExpr(str);
    }

    public BoolExpr(BoolExpr other)
    {
        // No share any object on purpose
        _op = other._op;
        _left  = other._left  == null ? null : new BoolExpr(other._left);
        _right = other._right == null ? null : new BoolExpr(other._right);
        _lit = new StringBuilder(other._lit).ToString();
    }

    //
    //  state checker
    //

    Boolean IsLeaf()
    {
        return (_op == BOP.LEAF);
    }

    Boolean IsAtomic()
    {
        return (IsLeaf() || (_op == BOP.NOT && _left.IsLeaf()));
    }
}

What algorithm should I use to parse an input boolean expression string like "￢((A ∧ B) ∨ C ∨ D)" and load it into the above class?

don't know if this would work, but I'd try to use [the Shunting Yard algorithm](http://en.wikipedia.org/wiki/Shunting-yard_algorithm) to convert the expression to Polish notation, which would be easy to parse — Cristian Lupascu, Jul 10 '13 at 10:20
Try [Reverse Polish notation](http://en.wikipedia.org/wiki/Reverse_Polish_notation). There's already plenty of its implementations and examples thereof in C# around the Internet. — S_F, Jul 10 '13 at 10:21
If it as me, I would use this opportunity to pick up F#, specifically discriminated unions : http://msdn.microsoft.com/en-us/library/dd233226.aspx — Tormod, Jul 10 '13 at 10:23
Various StackOverflow answers suggest using NCalc http://ncalc.codeplex.com/ However, those people just want the answer, not necessarily the creation of the expression tree. NCalc is open source, so you could look at its code. See http://www.codeproject.com/Articles/18880/State-of-the-Art-Expression-Evaluation for a related discussion. — ToolmakerSteve, Jul 10 '13 at 10:57
More suggestions for parsers in http://stackoverflow.com/questions/3079650/open-source-mathematical-expression-parser also in http://stackoverflow.com/questions/6516806/how-would-you-create-an-expression-parser-in-c — ToolmakerSteve, Jul 10 '13 at 11:01
I did it with the help of the Shunting Yard algorithm. You could check out my C# code [here](https://github.com/Genfood/Proviant). — Genfood, Apr 04 '19 at 19:30

score 53 · Accepted Answer · edited Apr 10 '14 at 21:56

TL;DR: If you want to see the code, jump to the second portion of the answer.

I would build a tree from the expression to parse and then traverse it depth first. You can refer to the wikipedia article about Binary Expression Trees to get a feel for what I'm suggesting.

Start by adding the omitted optional parentheses to make the next step easier
When you read anything that is not an operator or a parenthese, create a LEAF type node
When you read any operator (in your case not, and, or), create the corresponding operator node
Binary operators get the previous and following nodes as children, unary operators only get the next one.

So, for your example ￢((A ∧ B) ∨ C ∨ D), the algorithm would go like this:

￢((A ∧ B) ∨ C ∨ D) becomes ￢(((A ∧ B) ∨ C) ∨ D)

Create a NOT node, it'll get the result of the following opening paren as a child.
Create A LEAF node, AND node and B LEAF node. AND has A and B as children.
Create OR node, it has the previously created AND as a child and a new LEAF node for C.
Create OR node, it has the previously created OR and a new node for D as children.

At that point, your tree looks like this:

  NOT
   |
  OR
  /\
 OR D
 / \
AND C
/\
A B

You can then add a Node.Evaluate() method that evaluates recursively based on its type (polymorphism could be used here). For example, it could look something like this:

class LeafEx {
    bool Evaluate() {
        return Boolean.Parse(this.Lit);
    }
}

class NotEx {
    bool Evaluate() {
        return !Left.Evaluate();
    }
}

class OrEx {
    bool Evaluate() {
        return Left.Evaluate() || Right.Evaluate();
    }
}

And so on and so forth. To get the result of your expression, you then only need to call

bool result = Root.Evaluate();

Alright, since it's not an assignment and it's actually a fun thing to implement, I went ahead. Some of the code I'll post here is not related to what I described earlier (and some parts are missing) but I'll leave the top part in my answer for reference (nothing in there is wrong (hopefully!)).

Keep in mind this is far from optimal and that I made an effort to not modify your provided BoolExpr class. Modifying it could allow you to reduce the amount of code. There's also no error checking at all.

Here's the main method

static void Main(string[] args)
{
    //We'll use ! for not, & for and, | for or and remove whitespace
    string expr = @"!((A&B)|C|D)";
    List<Token> tokens = new List<Token>();
    StringReader reader = new StringReader(expr);

    //Tokenize the expression
    Token t = null;
    do
    {
        t = new Token(reader);
        tokens.Add(t);
    } while (t.type != Token.TokenType.EXPR_END);

    //Use a minimal version of the Shunting Yard algorithm to transform the token list to polish notation
    List<Token> polishNotation = TransformToPolishNotation(tokens);

    var enumerator = polishNotation.GetEnumerator();
    enumerator.MoveNext();
    BoolExpr root = Make(ref enumerator);

    //Request boolean values for all literal operands
    foreach (Token tok in polishNotation.Where(token => token.type == Token.TokenType.LITERAL))
    {
        Console.Write("Enter boolean value for {0}: ", tok.value);
        string line = Console.ReadLine();
        booleanValues[tok.value] = Boolean.Parse(line);
        Console.WriteLine();
    }

    //Eval the expression tree
    Console.WriteLine("Eval: {0}", Eval(root));

    Console.ReadLine();
}

The tokenization phase creates a Token object for all tokens of the expression. It helps keep the parsing separated from the actual algorithm. Here's the Token class that performs this:

class Token
{
    static Dictionary<char, KeyValuePair<TokenType, string>> dict = new Dictionary<char, KeyValuePair<TokenType, string>>()
    {
        {
            '(', new KeyValuePair<TokenType, string>(TokenType.OPEN_PAREN, "(")
        },
        {
            ')', new KeyValuePair<TokenType, string>(TokenType.CLOSE_PAREN, ")")
        },
        {
            '!', new KeyValuePair<TokenType, string>(TokenType.UNARY_OP, "NOT")
        },
        {
            '&', new KeyValuePair<TokenType, string>(TokenType.BINARY_OP, "AND")
        },
        {
            '|', new KeyValuePair<TokenType, string>(TokenType.BINARY_OP, "OR")
        }
    };

    public enum TokenType
    {
        OPEN_PAREN,
        CLOSE_PAREN,
        UNARY_OP,
        BINARY_OP,
        LITERAL,
        EXPR_END
    }

    public TokenType type;
    public string value;

    public Token(StringReader s)
    {
        int c = s.Read();
        if (c == -1)
        {
            type = TokenType.EXPR_END;
            value = "";
            return;
        }

        char ch = (char)c;

        if (dict.ContainsKey(ch))
        {
            type = dict[ch].Key;
            value = dict[ch].Value;
        }
        else
        {
            string str = "";
            str += ch;
            while (s.Peek() != -1 && !dict.ContainsKey((char)s.Peek()))
            {
                str += (char)s.Read();
            }
            type = TokenType.LITERAL;
            value = str;
        }
    }
}

At that point, in the main method, you can see I transform the list of tokens in Polish Notation order. It makes the creation of the tree much easier and I use a modified implementation of the Shunting Yard Algorithm for this:

static List<Token> TransformToPolishNotation(List<Token> infixTokenList)
{
    Queue<Token> outputQueue = new Queue<Token>();
    Stack<Token> stack = new Stack<Token>();

    int index = 0;
    while (infixTokenList.Count > index)
    {
        Token t = infixTokenList[index];

        switch (t.type)
        {
            case Token.TokenType.LITERAL:
                outputQueue.Enqueue(t);
                break;
            case Token.TokenType.BINARY_OP:
            case Token.TokenType.UNARY_OP:
            case Token.TokenType.OPEN_PAREN:
                stack.Push(t);
                break;
            case Token.TokenType.CLOSE_PAREN:
                while (stack.Peek().type != Token.TokenType.OPEN_PAREN)
                {
                    outputQueue.Enqueue(stack.Pop());
                }
                stack.Pop();
                if (stack.Count > 0 && stack.Peek().type == Token.TokenType.UNARY_OP)
                {
                    outputQueue.Enqueue(stack.Pop());
                }
                break;
            default:
                break;
        }

        ++index;
    }
    while (stack.Count > 0)
    {
        outputQueue.Enqueue(stack.Pop());
    }

    return outputQueue.Reverse().ToList();
}

After this transformation, our token list becomes NOT, OR, OR, C, D, AND, A, B.

At this point, we're ready to create the expression tree. The properties of Polish Notation allow us to just walk the Token List and recursively create the tree nodes (we'll use your BoolExpr class) as we go:

static BoolExpr Make(ref List<Token>.Enumerator polishNotationTokensEnumerator)
{
    if (polishNotationTokensEnumerator.Current.type == Token.TokenType.LITERAL)
    {
        BoolExpr lit = BoolExpr.CreateBoolVar(polishNotationTokensEnumerator.Current.value);
        polishNotationTokensEnumerator.MoveNext();
        return lit;
    }
    else
    {
        if (polishNotationTokensEnumerator.Current.value == "NOT")
        {
            polishNotationTokensEnumerator.MoveNext();
            BoolExpr operand = Make(ref polishNotationTokensEnumerator);
            return BoolExpr.CreateNot(operand);
        }
        else if (polishNotationTokensEnumerator.Current.value == "AND")
        {
            polishNotationTokensEnumerator.MoveNext();
            BoolExpr left = Make(ref polishNotationTokensEnumerator);
            BoolExpr right = Make(ref polishNotationTokensEnumerator);
            return BoolExpr.CreateAnd(left, right);
        }
        else if (polishNotationTokensEnumerator.Current.value == "OR")
        {
            polishNotationTokensEnumerator.MoveNext();
            BoolExpr left = Make(ref polishNotationTokensEnumerator);
            BoolExpr right = Make(ref polishNotationTokensEnumerator);
            return BoolExpr.CreateOr(left, right);
        }
    }
    return null;
}

Now we're golden! We have the expression tree that represents the expression so we'll ask the user for the actual boolean values of each literal operand and evaluate the root node (which will recursively evaluate the rest of the tree as needed).

My Eval function follows, keep in mind I'd use some polymorphism to make this cleaner if I modified your BoolExpr class.

static bool Eval(BoolExpr expr)
{
    if (expr.IsLeaf())
    {
        return booleanValues[expr.Lit];
    }

    if (expr.Op == BoolExpr.BOP.NOT)
    {
        return !Eval(expr.Left);
    }

    if (expr.Op == BoolExpr.BOP.OR)
    {
        return Eval(expr.Left) || Eval(expr.Right);
    }

    if (expr.Op == BoolExpr.BOP.AND)
    {
        return Eval(expr.Left) && Eval(expr.Right);
    }

    throw new ArgumentException();
}

As expected, feeding our test expression ￢((A ∧ B) ∨ C ∨ D) with values false, true, false, true for A, B, C, D respectively yields the result false.

"Recursive Descent Parser" would help this. Each "(" starts a sub-expression (calls an "Expression" method) that is ended by the matching ")". Some of the answer in http://stackoverflow.com/questions/2969561/how-to-parse-mathematical-expressions-involving-parentheses may help. — ToolmakerSteve, Jul 10 '13 at 14:27
This is not an assignment. I am just curious as to how to approach such problem. Any further details would be appreciated. — B Faley, Jul 11 '13 at 20:57
@Meysam I updated my answer with a lot more details, since you tell me it's not homework or anything. The entire thing should be a self-contained working example. — anthonyvd, Jul 12 '13 at 15:48
Thank you very much for the thorough answer. Isn't there any need to add the omitted optional parentheses to the input string before converting it to the polish notation? — B Faley, Jul 13 '13 at 10:39
@Meysam turns out, not really. All those operators have equal precedence (you'll notice if you rip apart my shunting yard that I actually treated unary operators as functions). Also, it easier to add the parentheses by using the resulting expression tree than before we create it :). Glad I could help! — anthonyvd, Jul 13 '13 at 12:48
@Meysam you could reason that basically, we use Polish notation in part to resolve the fact that some parentheses are omitted. It's 100% unambiguous by design without them. — anthonyvd, Jul 13 '13 at 13:14
Thanks for this. In order to process comparison operators instead of passing in true/false is it necessary to process the operands from right to left? I've noticed that `10 > 5` becomes `> 5 10`. — Ben Foster, Dec 25 '16 at 10:35
this is a great answer, but it has one serious flaw - it doesn't account for operators priority. e.g. `true OR false OR false AND false` would be evaluated to `false` while it should be `true` (`AND ` operation goes first, not last) — Konstantin Chernov, Mar 13 '17 at 21:30
@KonstantinChernov Indeed the implementation of the Shunting-Yard algorithm in this answer doesn't account for operator precedence (mainly for simplicity's sake). If you're interested, the wikipedia page on the algorithm explains how to take precedence into account! — anthonyvd, Mar 14 '17 at 15:16
thank you, @AnthonyVallée-Dubois, indeed it's an easy fix. I just wanted to warn out copypasters from throwing your code into their implementations (to avoid surprises) :) — Konstantin Chernov, Mar 14 '17 at 17:50

score 3 · Answer 2 · answered Jul 10 '13 at 13:29

From the algorithm point of view, to parse an expression, you need one stack.

We use two steps algorithm :

Lexing

The aim of lexing is to get 'keywords', 'identifiers' and 'separators' : - A keyword is 'if' 'then' 'else' '(' ')' '/\' '/' etc... - An identifiers in your case is 'A', 'B', 'C' etc... - A separator is blank space, tabulation, end of line, end of file, etc...

Lexing consist of using an automata. In lexing you will read your input string char by char. When you encouter a char that is compatible with one of your keyword, identifiers, separators, you start a sequence of char. When you encouter a separators you stop the sequence, look in a dictionnary of the sequence is a keyword (if not it is a identifier); then put the tuple [sequence, keyword or identifier/class] on the stack.

I leave you as exercice the case of small keyword '(' that can be also see as separators.

Parsing

Parsing is similar to grammar. In your case the only rules to check are comma, and binary operations, and just a simple identifier.

formaly :

expression::
  '(' expression ')'
  expression /\ expression
  expression \/ expression
  identifier

This can be write by a recursive function. First reverse your stack, then:

myParseExpression(stack, myC#ResultObject)
{
    if(stack.top = kewyord.'('  )
         then myParseOpenComma(all stack but top, myC#ResultObject)
    if(stack.top = keyword.'/\')
         then myParseBinaryAnd(stack, myC#ResultObject)
}

myParseOpenComma(stack, myC#ResultObject)
{
 ...
}

myParseBinaryAnd(stack, myC#ResultObject)
{
 myNewRigthPartOfExpr = new C#ResultObject
 myParseExpression(stack.top, myNewRigthPartOfExpr)
 remove top of stack;
 myNewLeftPartOfExpr = new C#ResultObject
 myParseExpression(stack.top, myNewLeftPartOfExpr)

 C#ResultObject.add("AND", myNewRigthPartOfExpr, myNewLeftPartOfExpr)
}

...

There is multiple function that share recursion on each other. As exercice, try to add the negation.

Lexing is traditionnally done by a lexer (like lex tool).
Parsing is traditionnaly done by a parser (like bison tool).
Tool allow write of thoses function more like I have done in the formaly expression.

Thoses aspect are fundamental of program compilation. Coding thoses thing will improve you a lot because it is hard and fundamental.

I will not put true C# code, because the question look like an assignement/homework. — Galigator, Jul 10 '13 at 13:31

How to parse a boolean expression and load it into a class?

2 Answers2

Linked

Related