25

Any ideas for reducing boost::spirit compile time?

I have just ported a flex parser to boost::spirit. The EBNF has about 25 rules.

The result runs well and the runtime performance is fine.

The problem is that compile takes for ever! It takes about ten minutes and requires almost a gigabyte of memory. The original flex parser compiled in a few seconds.

I am using boost version 1.44.0 and Visual Studio 2008.


In Joel de Guzman's article 'Best Practices' it says

Rules with complex definitions hurt the compiler badly. We’ve seen rules that are more than a hundred lines long and take a couple of minutes to compile

Well I do not have anything near that long, but my compile still takes way more than a couple of minutes

Here is the most complex part of my grammar. Is it a candidate for being broken up into smaller parts, in some way?

rule
    =   (   tok.if_ >> condition  >> tok.then_ >> *sequel  )                            [ bind( &cRuleKit::AddRule, &myRulekit ) ]
    |   (   tok.if_ >> condition  >> tok.and_ >> condition >> tok.then_ >> *sequel  )   [ bind( &cRuleKit::AddRule, &myRulekit ) ]
    ;

condition
    =   ( tok.identifier >> tok.oper_ >> tok.value )                                    [ bind( &cRuleKit::AddCondition, &myRulekit, _pass, _1, _2, _3 ) ]
    |   ( tok.identifier >> tok.between_ >> tok.value >> "," >> tok.value )             [ bind( &cRuleKit::AddConditionBetween, &myRulekit, _pass, _1, _3, _4 ) ]
    ;

sequel
    =   ( tok.priority_ >> tok.high_ )      [ bind( &cRuleKit::setPriority, &myRulekit, 3 ) ]
    |   ( tok.priority_  )                  [ bind( &cRuleKit::setPriority, &myRulekit, 2 ) ]
    |   ( tok.interval_ >> tok.value )      [ bind( &cRuleKit::setInterval, &myRulekit, _2 ) ]
    |   ( tok.mp3_ >> tok.identifier )      [ bind( &cRuleKit::setMP3, &myRulekit, _2 ) ]
    |   ( tok.disable_ )                    [ bind( &cRuleKit::setNextRuleEnable, &myRulekit, false ) ]
    ;

By commenting out parts of the grammar, I discovered which part the compiler was spending most time with.

set_reading
    =   tok.set_reading >> +attribute_reading
    ;

attribute_reading
    =   ( tok.name_ >> tok.identifier )
        [ bind( &cPackage::Add, &myReadings, _pass, _2 ) ]
    |   ( tok.nmea_ >> tok.identifier )
        [ bind( &cPackage::setNextNMEA, &myReadings, _2 ) ]
    |   ( tok.column_ >> tok.integer )
        [ bind( &cPackage::setNextColumn, &myReadings, _2 ) ]
    |   ( tok.precision_ >> tok.value )
        [ bind( &cPackage::setNextPrecision, &myReadings, _2 ) ]
    |   ( tok.unit_ >> tok.identifier )
        [ bind( &cPackage::setNextUnit, &myReadings, _2 ) ]
    |   ( tok.value_ >> tok.identifier )
        [ bind( &cPackage::setNextValue, &myReadings, _2 ) ]
    |   ( tok.qualifier_ >> tok.identifier >> tok.qual_col_ >> tok.integer )
        [ bind( &cPackage::setNextQualifier, &myReadings, _2, _4 ) ]
    ;

I wouldn't call it complex, but it is certainly the longest rule. So I thought I would try splitting it up, like this:

set_reading
    =   tok.set_reading >> +attribute_reading
    ;

attribute_reading
    =   attribute_reading_name
    |   attribute_reading_nmea
    |   attribute_reading_col
    |   attribute_reading_precision
    |   attribute_reading_unit
    |   attribute_reading_value
    |   attribute_reading_qualifier
    ;



attribute_reading_name
    =   ( tok.name_ >> tok.identifier )     [ bind( &cPackage::Add, &myReadings, _pass, _2 ) ]
    ;
attribute_reading_nmea
    =   ( tok.nmea_ >> tok.identifier )     [ bind( &cPackage::setNextNMEA, &myReadings, _2 ) ]
    ;
attribute_reading_col
    =   ( tok.column_ >> tok.integer )      [ bind( &cPackage::setNextColumn, &myReadings, _2 ) ]
    ;
attribute_reading_precision
    =   ( tok.precision_ >> tok.value )     [ bind( &cPackage::setNextPrecision, &myReadings, _2 ) ]
    ;
attribute_reading_unit
    =   ( tok.unit_ >> tok.identifier )     [ bind( &cPackage::setNextUnit, &myReadings, _2 ) ]
    ;
attribute_reading_value
    =   ( tok.value_ >> tok.identifier )    [ bind( &cPackage::setNextValue, &myReadings, _2 ) ]
    ;
attribute_reading_qualifier
    =   ( tok.qualifier_ >> tok.identifier >> tok.qual_col_ >> tok.integer ) [ bind( &cPackage::setNextQualifier, &myReadings, _2, _4 ) ]
    ;

This saves several minutes from the total compile time!!!

Strangely, the peak memory requirement remains the same, it is just required for less time

So, I am feeling a bit more hopeful that all my efforts in learning boost::spirit are going to be worthwhile.

I do think it is a bit strange that the compiler needs to be so carefully guided in this way. I would have thought a modern compiler would have noticed that this rule was just a list of independent OR rules.


I have spent the best part of seven days learning boost::spirit and porting a small, but real world, parser from flex. My conclusion is that it works and the code is very elegant. Unfortunately, naive use by simply expanding the tutorial example code for a real application quickly overburdens the compiler - memory and time taken for a compile becomes completely impractical. Apparently there are techniques for managing this problem but they require arcane knowledge which I do not have the time to learn. I guess I will stick to flex which may be ugly and old-fashioned but is relatively simple and lightning fast.

Claudiu Cruceanu
  • 135
  • 1
  • 1
  • 10
ravenspoint
  • 19,093
  • 6
  • 57
  • 103

2 Answers2

16

One trick I could suggest is to separate the compilation of the constructors of both, your lexer and your grammar. The easiest way to achieve this is to leave only the declaration of those constructors in their respecitive header files and to move the definition of those functions into separate translation units. For instance:

grammar.hpp:

template <typename Iterator>
struct grammar : qi::grammar<Iterator>
{
    grammar();   // declaration only
    // ...
};

grammar_def.hpp:

// This file should not contain anything else.
#include "grammar.hpp"

// Definition of constructor.
template <typename Iterator>
grammar<Iterator>::grammar()
{
    // initialize your rules here
}

grammar.cpp:

// This file should not contain anything else.
#include "grammar_def.hpp"

// Explicitly instantiate the constructor for the iterator type
// you use to invoke the grammar (here, as an example I use 
// std::string::const_iterator).
typedef std::string::const_iterator iterator_type;
template grammar<iterator_type>::grammar();

Do the same thing for the lexer object.

This approach requires a bit more work than the straight method, but it allows to distribute the memory and time requirements for the overall compilation. Another advantage of this approach is that any change in the grammar constructor does not require the recompilation of anything except the file grammar.cpp.

Another advice for the lexer: try to minimize the use of token_def<> instances as much as possible. You need to use token_def<> only when you want to access the token value as an attribute during parsing. In all other cases you might get away with lex::string or lex::char_ to define your tokens.

hkaiser
  • 11,403
  • 1
  • 30
  • 35
  • Please, can you point to an example using lex::string to define a token? I get compiler errors when I simply replace token_def<> with lex::string – ravenspoint Apr 01 '11 at 14:30
  • In grammar_def.hpp you write "This file should not contain anything else". What about files that might be needed for the definition (but not the declaration) of the rules, such as phoenix_core.hpp, phoenix_operator.hpp? – engineerX Nov 27 '12 at 17:20
13

I have to come to the conclusion that boost:spirit, elegant as it is, is not a viable option for many real world parsing problems due to the lengthy compile times that even experts cannot fix.

It is often best to stick to something like flex, which may be ugly and old-fashioned, but is relatively simple and lightning fast.

As an example of what I consider a 'real world' problem here is the railroad diagram of the most important part of a parser that flex compiles in a couple of seconds, but boost:spirit is still chugging away on after ten minutes

enter image description here

eepp
  • 7,255
  • 1
  • 38
  • 56
ravenspoint
  • 19,093
  • 6
  • 57
  • 103
  • Thank you for contributing your conclusion, however unpopular it may be. It was useful for me, as I'm trying to approach the use of spirit. – Homer6 Aug 26 '12 at 05:02
  • 4
    @ravenspoint After the 20th-or-so time I'ven run into this answer I've taken the liberty to change your wording a bit, so that it isn't unnecessarily "absolute" anymore. And in the resulting form, I can lend my +1. I love Spirit, but I certainly won't use it for every parsing job – sehe Nov 06 '13 at 07:52
  • @sehe I am curious - do you actually use spirit for a real world parsing problem, say something with at least 20 rules? How long does your compile take? – ravenspoint Nov 06 '13 at 12:06
  • 1
    @ravenspoint My grammar compiles take anywhere from 20s to 2 minutes, but that's never very relevant as I rarely compile the parsers. I have build rules, ccache/pch where applicable. My {refactor/[build]/test}/greenbar builds take under 10s typically. – sehe Nov 06 '13 at 16:19
  • Compilation is extremely relevant during development and debugging of a parser. I would find 2 minutes wait for a compile hard to tolerate. My 25 rule parser needed 10 minutes - just impossible. The same flex parser needed a couple of seconds. – ravenspoint Nov 06 '13 at 16:40
  • 2
    @ravenspoint I guess that's the tipping point for me: I use Spirit in an agile way. I rarely spend much time developing the parser. I used to, though, back when I used CoCo/R and flex. That was basically because I was mixing parsing with processing code. You could say that I've "adapted" my workflow, but I don't regret this. It also means that Spirit can be quite frustrating if you're not already (very) experienced with it :( – sehe Nov 06 '13 at 17:00
  • PS. I recently told @namezero about my loves and dislike of Boost Spirit in this chat: [you might want to check that](http://chat.stackoverflow.com/transcript/message/12697797#12697797) too – sehe Nov 06 '13 at 17:03
  • @ravenspoint out of curiosity: Re. "My 25 rule parser needed 10 minutes" - do you still have it? I'd like to tackle it just so I can be aware of it, make a mental note of the case and possibly think of a workaround. – sehe Nov 06 '13 at 19:11
  • From 2++ years ago! Well I haven't looked at it in all that time, but my VCS still has it. https://www.dropbox.com/sh/o3sz3jng3egrafu/-X1m58AO5p – ravenspoint Nov 06 '13 at 19:35
  • let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/40663/discussion-between-sehe-and-ravenspoint) – sehe Nov 06 '13 at 20:23