Boost.Spirit Alternative Parser parallelization

Question

I'm parsing a buffer using x3 rule that has many alternative sub-rules. Actually, I have data from different GPS devices and my main parser looks like this :

 auto gps_r = device1_r | device2_r | device3_r;

 bool ok = x3::parse(...,gps_r,..);

I understand I may implement invoking of x3::parse() in parallel for input data and for each device rule. But that may not be applicable for some recursive parsing (say SAX DOM parsing).

My question is more theoretical : are there any attempts to make Alternative Parser being async (say, using boost.coroutines2) to make parsing in parallel?

If you find anything about it, I'd love to know. I can use that too. — Richard Hodges, Aug 29 '18 at 07:29
There aren't. And I highly doubt this would be a net performance win. The only area where I see this being of benefit is doing input _validation_ of _large_ messages (so e.g. checksum/digest calculation), and even here it would be when the protocol is lacking and just "checking all possible formats" seems to run counter to secure programming practices (opening up to as many DOS/attack vectors as possible with malicious loads) — sehe, Aug 29 '18 at 08:03
If you have concrete examples, we might come up with existing approaches (e.g. when [creating a static Lexer](https://www.boost.org/doc/libs/1_68_0/libs/spirit/doc/html/spirit/lex/abstracts/lexer_static_model.html), all paths for static alternatives get compiled into 1 state machine that _does_ do the least amount of work possible for all "branches" in parallel, much like a DFA representation for a regex) — sehe, Aug 29 '18 at 08:09
Ok, I see the point. Thank you for the link to lexer. The real example : parsing xml/json file (10Mb). Even when prosessing simple json : {"foo" : 1, "bar" : { "sub" : false }}, we have some rule like 'auto json_r_def = x3::char_ >> value_r | array_r | json_r;'. That actually means we are doing sequesntial processing of the same data for different rules. And the only way to encrease performance here is to parallelize execution. — drus, Aug 29 '18 at 08:40

score 1 · Accepted Answer · answered Aug 29 '18 at 20:22

1

Usually there is no room for parallelization as composed grammars tries to perform as less lookahead and backtrack as possible. In this case any benefits from parallel parsing will be outweighed by thread spawning and synchronization overhead.

If your grammar actually branches from the beginning like in the shown example - you can rewrite it to run multiple x3::parse in parallel.

There is also problem in current Spirit alternate parser implementation, as it does not flatten the expression tree (while it is a binary tree, usually it is very unbalanced).

answered Aug 29 '18 at 20:22

Nikita Kniazev

3,728
2
16
30

That last bit is very insightful tidbit, but I reckon it only hurts compilation time, not so much anything else? I think I could whip up a custom parser that does parallel alternatives [like here](https://stackoverflow.com/questions/37669936/spirit-qi-how-can-i-write-a-nonterminal-parser) but like you confirmed, I think if that "helps" there is a protocol design mistake of some size instead. – sehe Aug 30 '18 at 00:18
I do not think it hurts compilation time, flattening will require even more template instantiations and recursion depth is negligible to push hard the inliner. – Nikita Kniazev Aug 30 '18 at 01:01

Boost.Spirit Alternative Parser parallelization

1 Answers1