How to use non-blocking or asynchronous IO with Boost Spirit?

Question

Does Spirit provide any capabilities for working with non-blocking IO?

To provide a more concrete example: I'd like to use Boost's Spirit parsing framework to parse data coming in from a network socket that's been placed in non-blocking mode. If the data is not completely available, I'd like to be able to use that thread to perform other work instead of blocking.

The trivial answer is to simply read all the data before invoking Spirit, but potentially gigabytes of data would need to be received and parsed from the socket.

It seems like that in order to support non-blocking I/O while parsing, Spirit would need some ability to partially parse the data and be able to pause and save its parse state when no more data is available. Additionally, it would need to be able to resume parsing from the saved parse state when data does become available. Or maybe I'm making this too complicated?

score 4 · Accepted Answer · answered Dec 21 '11 at 21:09

TODO Will post a example for a simple single-threaded 'event-based' parsing model. This is largely trivial but might just be what you need.

For anything less trivial, please heed to following considerations/hints/tips:

How would you be consuming the result? You wouldn't have the synthesized attributes any earlier anyway, or are you intending to use semantic actions on the fly?

That doesn't usually work well due to backtracking. The caveats could be worked around by careful and judicious use of qi::hold, qi::locals and putting semantic actions with side-effects only at stations that will never be backtracked. In other words:

this is bound to be very errorprone
this naturally applies to a limited set of grammars only (those grammars with rich contextual information will not lend themselves well for this treatment).

Now, everything can be forced, of course, but in general, experienced programmers should have learned to avoid swimming upstream.

Now, if you still want to do this:

You should be able to get spirit library thread safe / reentrant by defining BOOST_SPIRIT_THREADSAFE and linking to libboost_thread. Note this makes the gobals used by Spirit threadsafe (at the cost of fine grained locking) but not your parsers: you can't share your own parsers/rules/sub grammars/expressions across threads. In fact, you can only share you own (Phoenix/Fusion) functors iff they are threadsafe, and any other extensions defined outside the core Spirit library should be audited for thread-safety.

If you manage the above, I think by far the best approach would seem to

use boost::spirit::istream_iterator (or, for binary/raw character streams I'd prefer to define a similar boost::spirit::istreambuf_iterator using the boost::spirit::multi_pass<> template class) to consume the input. Note that depending on your grammar, quite a bit of memory could be used for buffering and the performance is suboptimal
run the parser on it's own thread (or logical thread, e.g. Boost Asio 'strands' or its famous 'stackless coprocedures')
use coarse-grained semantic actions like shown above to pass messages to another logical thread that does the actual processing.

Some more loose pointers:

you can easily 'fuse' some functions to handle lazy evaluation of your semantic action handlers using BOOST_FUSION_ADAPT_FUNCTION and friends; This reduces the amount of cruft you have to write to get simple things working like normal C++ overload resolution in semantic actions - especially when you're not using C++0X and BOOST_RESULT_OF_USE_DECLTYPE
Because you will want to avoid semantic actions with side-effects, you should probably look at Inherited Attributes and qi::locals<> to coordinate state across rules in 'pure functional fashion'.

Thanks! I look forward to seeing your example. I'm glad you highlighted some of the points relating to backtracking. One of the grammars I'm working with has lots of expectation points in it so I'm OK there, but it's good for me to be aware of that for other grammars. — pyrachi, Dec 21 '11 at 21:39
As for how I was going to going to consume the results, I hadn't gotten to that point yet, but it's really its own question. I was going to do a deep-dive into semantic actions and/or the attribute compatibility rules to determine if there was some way I could "stream" the attributes out after an expectation point was reached. — pyrachi, Dec 21 '11 at 21:44
@Emanuel: erm. No, the question about consuming the results _defines_ whether streaming makes any sense. You should have a design/model for that before complicating your parser step unnecessarily, thank you very much. There are numerous operations that simply cannot be made streaming (think of a XSLT transform, or simply a reversing filter a la [`tac`](http://linux.die.net/man/1/tac) -- you cannot beat entropy, not even if you try very hard to multithread and message queue. You'll just get lots of locked threads or long queueus). — sehe, Dec 21 '11 at 21:48
Because it's not good style to never deliver on the promise, I refer to this recent example for a single- and multi-threaded event-based parser design: https://stackoverflow.com/questions/41748596/how-to-incrementally-parse-and-act-on-a-large-file-with-boost-spirit-qi/41753143#41753143 — sehe, Jan 23 '17 at 17:35

How to use non-blocking or asynchronous IO with Boost Spirit?

1 Answers1