4

I wrote some grammar with boost::spirit::qi::rule to parse the internet packet. the grammar is something like:

qi::rule<Iterator> start, request, response, status, query ;
start = (request | response | status | query) >> lit("\r\n");

to improve the performance, user maybe want to skip some rules in the runtime, e.g. ignore "response","status","query" and only try to match request, so the rule will change to:

start = (request ) >> lit("\r\n"); 

is it possible to do that? e.g, is there a function like "disable()" to just disable the rule "response", "status" and "query"?

Rui Zhou
  • 209
  • 1
  • 3
  • 9

2 Answers2

6

The most natural approach would be to just use a different parser for your more constrained occasions.

Also, if performance is so important that you can't even spare 3 or 4 extra character comparisons

  • you should probably do a handwritten parser (in general, Spirit's automatic attribute handling isn't always optimal) or
  • consider statically optimized scanning (so Boost Xpressive or Spirit Lex)
  • you might perhaps be doing something wrong (if you have a suboptimal grammar you can have much backtracking kill the performance - much like degenerate Regular Expressions (https://www.owasp.org/index.php/Regular_expression_Denial_of_Service_-_ReDoS))

That said, here are some options:

1. Use a variable rule through qi::lazy

#include <boost/spirit/include/qi.hpp>

namespace qi    = boost::spirit::qi;
namespace phx   = boost::phoenix;

int main()
{
    typedef std::string::const_iterator It;
    qi::rule<It> 
        request  = "request",
        response = "response",
        status   = "status",
        query    = "query",
        // 
        allowed;

    static qi::rule<It> const start = qi::lazy(phx::ref(allowed)) >> qi::lit("\r\n");
    static const auto test = [](std::string const& input) { return qi::parse(begin(input), end(input), start); };

    for (int i=0; i<10; ++i)
    {
        switch(rand()%3)
        {
            case 0: allowed = request; break;
            case 1: allowed = request | response | query; break;
            case 2: allowed = request | response | status | query; break;
        }

        std::cout << "status: "    << test("status\r\n")   << "\t"
                  << "response: "  << test("response\r\n") << "\t"
                  << "request:  "  << test("request\r\n")  << "\n";
    }
}

Like Mike mentions, this is also used in the Nabialek trick, although qi::lazy is the essential ingredient here.

This prints, e.g.:

status: 0   response: 1 request:  1
status: 0   response: 1 request:  1
status: 0   response: 0 request:  1
status: 0   response: 1 request:  1
status: 1   response: 1 request:  1
status: 0   response: 1 request:  1
status: 0   response: 1 request:  1
status: 0   response: 0 request:  1
status: 0   response: 0 request:  1
status: 0   response: 1 request:  1

2. Use inherited attributes with qi::lazy

Pretty similar to the above, you could pass 'subrules' as inherited attributes. I'm not sure I'd recommend this, as I've seen Undefined Behaviour crop up in past examples, see e.g.

3. Just use separate rules

This is most natural, in my opinion:

std::function<bool(string)> test;
switch(rand()%3)
{
    case 0: test = [&](std::string const& input) { return qi::parse(begin(input), end(input), request); }; break;
    case 1: test = [&](std::string const& input) { return qi::parse(begin(input), end(input), request | response | query); }; break;
    case 2: test = [&](std::string const& input) { return qi::parse(begin(input), end(input), request | response | status | query); }; break;
}

See full sample: http://coliru.stacked-crooked.com/a/603f093add6b9799

Community
  • 1
  • 1
sehe
  • 374,641
  • 47
  • 450
  • 633
  • May I ask an additional question. You seem to recommend writing a parser manually quite often, but isn't that hard to do right? Back tracking doesn't seem to be too simple and how would one verify, that you got all cases? Any insights or something to read? Thank you. – Mike M Nov 10 '13 at 00:50
  • Writing a parser manually is rarely complicated, but boring and tedious. This especially hurts if the grammar changes. Only if your grammar is "wild" (think: c++) it would become hard. Testing would be done in exactly the same way in which you test your spirit grammar. – sehe Nov 10 '13 at 09:44
  • Now note: I ***did not*** recommend that you wrote your own parser, and given your comment would recommend the opposite. I was observing that this sounds like premature optimization for the reasons stated. (Have you profiled the difference? ) – sehe Nov 10 '13 at 09:47
  • @MikeM I just realized you're not the OP here :/ So my advice was mis-addressed. I hope you got the intent then. Spirit shines in the **rapid** composition of grammars, with reasonable performance: it's optimizing for developer time (at the cost of (leaky) abstraction and most importantly: compilation complexity). There are many situations when this makes a lot of sense, which is why I love Spirit. – sehe Nov 10 '13 at 11:22
  • Don't worry I didn't mis-interpret your comment. Sometimes it's just hard to put learned stuff into perspective; I mean, when to be sure to really get some part of knowledge and not only the tip of the iceberg. That's why I asked here, to put spirit in a bit larger perspective for me. Anyway I learned a lot from your (and cv_and_he's) answers in the last months, thank you for that. – Mike M Nov 10 '13 at 11:45
  • @MikeM Cheers. I keep on learning using SO. Have you been to the [lounge](http://chat.stackoverflow.com/rooms/10/loungec)? It's also a good place to hang out just to get contrasting points of view and (often well-founded) opinions. – sehe Nov 10 '13 at 11:50
  • I've looked at the lounge every now and then, but chat feels like it takes quite some time, and time is short as always... – Mike M Nov 10 '13 at 11:54
  • let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/40904/discussion-between-sehe-and-mike-m) – sehe Nov 10 '13 at 14:24
1

Yes this is possible by using the qi::symbols parser. One can change the used symbols at runtime, so you could alter the behaviour. To use this parser for complete rules there is a little trick, called the nabialek trick http://boost-spirit.com/home/articles/qi-example/nabialek-trick/.

Basically it shows how to hook complete rules in the symbols parser.

Mike M
  • 2,263
  • 3
  • 17
  • 31