4

I am trying to write a small and simple parser for symbolic specifications in C++. I first tried using tools such as Yacc/Bison (which I already know) and ANTLR, but these are much too unwieldy and complex for what I am trying to do.

My language has the following binary operators: = (specification), + (sum), . (product); and the following unary operators: Seq (sequence of an expression), Z(###) (unit element which can be numbered), and E(###) (neutral element which can also be numbered). It should also contain variables. I think the binding priority on sum and product should be the standard priority.

Here is an example of a system of rules:

A = Seq B
B = Z(1) + Z(2).Z(2).C
C = E(1) + Z(4).Z(4).B

which I would like to parse to a associative map, indexing expression trees (Seq B) to a variable name (A).

After several auxiliary attempts (using Python, for instance), I have identified an ideal tool for this, Spirit in the Boost Library. I would like something more tightly integrated in the code. And the following StackOverflow gives most of the elements that I should need:

Boolean expression (grammar) parser in c++

However Spirit is pretty complicated, and I am not helped by the fact that I am not comfortable with the advanced C++ traits that the library uses - or more to the fact, I am not sure how to correct errors given by my compiler. My latest attempt:

https://gist.github.com/anonymous/354bb84e54042a3b54f3

It will not compile. Even if it did, I am not sure I understand how I would manipulate the parsed structure. I am not sure if I am on the right track.

Can somebody help get back on point? Or suggest an alternate reasonable solution?

UPDATE 1: Here is a more precise description of what I want.

A list of rules

"VAR = EXPR"

Where EXPR is:

EXPR :=  EXPR + EXPR
      |  EXPR . EXPR
      |  Seq EXPR
      |  Z<digits*>       (defaults to zero if not placed)
      |  E<digits*>       (defaults to zero if not placed)
      |  VAR

VAR  := alnum+     (excludes Z## and E##)

For instance:

PP = Z + L1 . R1 + L2 . R2 + L3 . R3 + Seq Z3 . Seq Z4
L1 = Z10.Z9
R1 = Z12.PP + E4
...

Would give me an associative table:

[ "PP" -> parsing tree that I can navigate for (Z + L1...)
  "L1" -> parsing tree for ... 
]

Where PP is:

PP = (((((Z + (L1 . R1)) + (L2 . R2)) + (L3 . R3))) + ((Seq Z3) . (Seq Z4)))

I should say that cv_and_he's debugging was already incredibly useful, but: -- I am not sure my grammar is correct; -- I don't know how to parse the list from a system of equations; -- I have no clue how to exploit/navigate the parsing tree once its been parsed.

My endgoal is to write recursive functions that traverse the tree and make some simulations accordingly.

UPDATE 2: I have managed to get most of the parsing done:

https://gist.github.com/anonymous/b6a63a1aa5caa6461dd6

I wonder if it is correct, or if I am missing a mistake. After that, what I cannot figure out is: - how to best parse a list of such rules (instead of just one); - how to traverse the parsed rules, and modify states (for instance, I would like to renumber the variables, and associate each variable with an integer that is always the same - lambda renumbering).

UPDATE 3: I guess part of my question now is: how can I make a "static visitor" not static, i.e., how can I make it have a state, that is modifiable.

UPDATE 4: I found an answer to the last question here:

How can I use the boost visitor concept with a class containing state variables?

I wonder if it is a sound solution. And I am still left wondering what is the best way of parsing a list of rules.

Community
  • 1
  • 1
Jérémie
  • 1,353
  • 9
  • 19
  • 1
    [Here](http://coliru.stacked-crooked.com/a/45ab6c68fa63d81f) you can find a version that compiles, sadly the grammar seems to be wrong, and it won't parse your rules. Until Boost 1.56 releases defining `BOOST_SPIRIT_USE_PHOENIX_V3` solves lots of problems when you are using a "modern" compiler. – llonesmiz May 09 '14 at 18:04
  • Thank you! I am not sure how to specify the grammar, to be honest, so if you know how to fix it... :) – Jérémie May 09 '14 at 18:06
  • 1
    [This](http://coliru.stacked-crooked.com/a/3b3d8f965d9fc022) grammar seems to work, but you shouldn't trust that it's correct. I can translate a grammar to spirit pretty well, but I know very little about defining them. Be patient I'm sure a better answer is on the way. – llonesmiz May 09 '14 at 18:48
  • @cv_and_he I think that's about what we can do, in absense of semantics/formal grammar definition. I think it's fine. (OP: if you explain what you try to achieve, you can get help beyond the semicolons and other tecnicalities :)) – sehe May 09 '14 at 21:21
  • If performance is not an issue, you'll probably find that writing a recursive descent parser is the easiest way to go. – kec May 09 '14 at 21:40
  • @sehe I added additional details; is that what you are talking about? – Jérémie May 10 '14 at 01:29
  • @kec Yes, that's the most convenient way to go, but I am doing minor variations on this theme so often that I feel like I would like to not reinvent the wheel each time. Spirit seems convenient because it is part of the code, not a separate animal like Yacc/Bison or ANTLR. – Jérémie May 10 '14 at 01:30
  • question: do you know how to classify the grammar that you want to parse ? you want a lexer + parser approach or a scannerless one ? In your question you are not highlighting the main properties of what you are looking for, consider posting on one of the following sites 1. http://linguistics.stackexchange.com/ 2. http://cstheory.stackexchange.com/ 3. http://cs.stackexchange.com/ – user2485710 May 10 '14 at 09:59

0 Answers0