2

To handle large compile times and reuse of grammars I've composed my grammar into several sub-grammars which are called in sequence. One of them (call it: SETUP grammar) offers some configuration of the parser (via symbols parser), so later sub grammars logically depend on that one (again via different symbols parsers). So, after SETUP is parsed, the symbols parsers of the following sub grammars need to be altered.

My question is, how to approach this efficiently while preserving loose coupling between the sub grammars?

Currently I see only two possibilities:

  • The on_success handler of the SETUP grammar, which could do the work, but this would introduce quite some coupling.
  • After the SETUP, parse everything into a string, build up a new parser (from the altered symbols) and parse that string in a second step. This would leave quite some overhead.

What I would like to have is a on_before_parse handler, which could be implemented by any grammar which needs to do some work before each parsing. From my point of view, this would introduce less coupling and some setup of the parser could come handy in other situations, too. Is something like this possible?

Update:

Sorry for being sketchy, that wasn't my intention.

The task is to parse an input I with some keywords like #task1 and #task2. But there will be cases where these keywords need to be different, say $$task1 and $$task2.

So the parsed file will start with

setup {
  #task1=$$task1
  #task2=$$task2
}

realwork {
  ...
}

Some code sketches: Given is a main parser, consisting of several (at least two) parsers.

template<typename Iterator>
struct MainParser: qi::grammar<Iterator, Skipper<Iterator>> {

  MainParser() : MainParser::base_type(start) {
    start = setup >> realwork;
  }

  Setup<Iterator>    setup;
  RealWork<Iterator> realwork;

  qi::rule<Iterator, Skipper<Iterator> > start;
}

Setup and RealWork are themselves parsers (my sub parsers from above). During the setup part, some keywords of the grammar may be altered, so the setup part has a qi::symbols<char, keywords> rule. In the beginning these symbols will contain #task1 and #task2. After parsing the first part of the file, they contain $$task1 and $$task2.

Since the keywords have changed and since RealWork needs to parse I, it needs to know about the new keywords. So I have to transfer the symbols from Setup to RealWork during the paring of the file.

The two approaches I see are:

  • Make the Setup aware of RealWork and transfer the symbols from Setup to RealWork in the qi::on_success handler of Setup. (bad, coupling)
  • Switch to two parsing steps. start of MainParser will look like

    start = setup >> unparsed_rest
    

    and there will be a second parser afer MainParser. Schematically:

    SymbolTable Table;
    string Unparsed_Rest;
    MainParser.parse(Input, (Unparsed_Rest, Table));
    
    RealWordParser.setupFromAlteredSymbolTable(Table);
    RealWorkParser.parse(Unparsed_Rest);
    

    Overhead of several parsing steps.

So, up to now, attributes are not into play. Just changing the parser at parse time to handle several kinds of input files.

My hope is a handler qi::on_before_parse like qi::on_success. From the idea this handler would be triggered each time the parser starts parsing an input. Theoretically just an interception at the beginning of parsing, like we have the interceptions on_success and on_error.

Mike M
  • 2,263
  • 3
  • 17
  • 31
  • 1
    I've done my best with some general comments. Hope these things help you on track. If not, I suggest you come back with a _concrete_ question. I'm much more at home with code than 'ideas' because ideas _often_ mean something else to you than to me. – sehe Jul 22 '13 at 19:42
  • @sehe: Thank you very much for your efforts. I've tried to be more specific, how the grammar is devided and how parsing should work. – Mike M Jul 22 '13 at 20:43

1 Answers1

3

Sadly, you showed no code, and your description is a bit... sketchy. So here's a fairly generic answer that addresses some of the points I was able to distill from your question:

Separation of concerns

It sounds very much like you need to separate AST building from transformation/processing steps.

Parser composition

Of course you can compose grammars. Simply compose grammars as you would rules and hide the implementation of these grammars in any traditional way you would (pImpl idiom, const static internal rules, whatever fits the bill).

However, the composition usually doesn't require an 'event' driven element: if you feel the need to parse in two phases, it sounds to me you're just struggling to keep the overview, but recursive descent or PEG grammars are naturally well-suited to describe grammars like that in one swoop (or one pass, if you will).

However, if you find that

(a) your grammar gets complicated
(b) or you want to be able to selectively plugin subgrammars depending on runtime features

You could consider

  1. The Nabialek trick (I've shown/mentioned this on several occasions in my [tag:boost-spirit] answers on this site
  2. You could build rules dynamically (this is not readily recommended because you'll run in deadly traps having to do with copying Proto expression trees which leads to dangling references). I have also shown some answers doing this on occasion:

    REPEAT: don't try this unless you know how to detect UB and fix things with Proto

Hope these things help you on track. If not, I suggest you come back with a concrete question. I'm much more at home with code than 'ideas' because ideas often mean something else to you than to me.

Community
  • 1
  • 1
sehe
  • 374,641
  • 47
  • 450
  • 633