-1

I'm trying to write an expression parser. One part I'm stuck on is breaking down an expression into blocks via its appropriate order of precedence.

I found the order of precedence for C++ operators here. But where exactly do I split the expression based on this?

I have to assume the worst of the user. Here's a really messy over-exaggerated test example:

if (test(s[4]) < 4 && b + 3 < r && a!=b && ((c | e) == (g | e)) ||
   r % 7 < 4 * givemeanobj(a & c & e, b, hello(c)).method())

Perhaps it doesn't even evaluate, and if it doesn't I still need to break it down to determine that.

It should break down into blocks of singles and pairs connected by operators. Essentially it breaks down into a tree-structure where the branches are the groupings, and each node has two branches.

Following the order of precedence the first thing to do would be to evaluate the givemeanobj(), however that's an easy one to see. The next would be the multiplication sign. Does that split everything before the * into a separate , or just the 4? 4 * givemeanobj comes before the <, right? So that's the first grouping?

Is there a straightforward rule to follow for this?

Alasdair
  • 13,348
  • 18
  • 82
  • 138
  • Are you trying to write a C++ interpreter? Many exist: https://stackoverflow.com/questions/69539/have-you-used-any-of-the-c-interpreters-not-compilers – Ho1 Mar 27 '21 at 09:30
  • *"first thing to do would be to evaluate the `givemeanobj()`, however that's an easy one to see"* Is it? You need to evaluate `test(s[4])` first, because the lhs of `&&` has to be evaluated before the rhs. – HolyBlackCat Mar 27 '21 at 09:31
  • You also need to account for (1) associativity of operators (2) logical operators (`&&` and `||`) do short-circuiting (3) in function calls - the commas are not the comma operator (4) some numeric operators (e.g. `%`) and bitwise operators (`&`, `|`) are only applicable to integral types. [I'm assuming none of the variables in your "test example" are instances of classes with overloaded operators]. – Peter Mar 27 '21 at 09:38
  • A [recursive descent parser](https://en.wikipedia.org/wiki/Recursive_descent_parser) may be what you are looking for. The "analysis" (dissecting) of a complex expression is handled by the recursion. – Peter - Reinstate Monica Mar 27 '21 at 09:43
  • Also, as noted on the page you are linking, "_Precedence and associativity are compile-time concepts and are independent from [order of evaluation](https://en.cppreference.com/w/cpp/language/eval_order), which is a runtime concept._" – heap underrun Mar 27 '21 at 11:05

1 Answers1

1

Is there a straightforward rule to follow for this?

Yes, use a parser generator such as ANTLR. You write your language specification formally, and it will generate code which parses all valid expressions (and no invalid ones). ANTLR is nice in that it can give you an abstract syntax tree which you can easily traverse and evaluate.

Or, if the language you are parsing is actually C++, use Clang, which is a proper compiler and happens to be usable as a library as well.

John Zwinck
  • 239,568
  • 38
  • 324
  • 436