Extract messages from stream and ignore data between the messages using a boost::spirit parser

Question

I'm trying to create a (pretty simple) parser using boost::spirit::qi to extract messages from a stream. Each message starts from a short marker and ends with \r\n. The message body is ASCII text (letters and numbers) separated by a comma. For example:

!START,01,2.3,ABC\r\n
!START,456.2,890\r\n

I'm using unit tests to check the parser and everything works well when I pass only correct messages one by one. But when I try to emulate some invalid input, like:

!START,01,2.3,ABC\r\n
trash-message
!START,456.2,890\r\n

The parser doesn't see the following messages after an unexpected text.

I'm new in boost::spirit and I'd like to know how a parser based on boost::spirit::qi::grammar is supposed to work.

My question is: Should the parser slide in the input stream and try to find a beginning of a message? Or the caller should check the parsing result and in case of failure move an iterator and then recall the parser again?

Many thanks for considering my request.

score 1 · Accepted Answer · answered Jan 30 '23 at 16:39

My question is: Should the parser slide in the input stream and try to find a beginning of a message?

Only when you tell it to. It's called qi::parse, not qi::search. But obviously you can make a grammar ignore things.

Live On Coliru

//#define BOOST_SPIRIT_DEBUG
#include <boost/fusion/adapted.hpp>
#include <boost/spirit/include/qi.hpp>
#include <iomanip>
#include <iostream>

namespace qi = boost::spirit::qi;

struct Command {
    enum Type { START, QUIT, TRASH } type = TRASH;
    std::vector<std::string> args;
};
using Commands = std::vector<Command>;
BOOST_FUSION_ADAPT_STRUCT(Command, type, args)

template <typename It> struct CmdParser : qi::grammar<It, Commands()> {
    CmdParser() : CmdParser::base_type(commands_) {
        type_.add("!start", Command::START);
        type_.add("!quit", Command::QUIT);

        trash_    = *~qi::char_("\r\n"); // just ignore the entire line
        arg_      = *~qi::char_(",\r\n");
        command_  = qi::no_case[type_] >> *(',' >> arg_);
        commands_ = *((command_ | trash_) >> +qi::eol);

        BOOST_SPIRIT_DEBUG_NODES((trash_)(arg_)(command_)(commands_))
    }

  private:
    qi::symbols<char, Command::Type> type_;
    qi::rule<It, Commands()>         commands_;
    qi::rule<It, Command()>          command_;
    qi::rule<It, std::string()>      arg_;
    qi::rule<It>                     trash_;
};
int main() {
    std::string_view input = "!START,01,2.3,ABC\r\n"
                             "trash-message\r\n"
                             "!START,456.2,890\r\n";

    using It = std::string_view::const_iterator;
    static CmdParser<It> const parser;
    Commands parsed;
    auto                       f = input.begin(), l = input.end();
    if (parse(f, l, parser, parsed)) {
        std::cout << "Parsed:\n";
        for(Command const& cmd : parsed) {
            std::cout << cmd.type;
            for (auto& arg: cmd.args)
                std::cout << ", " << quoted(arg);
            std::cout << "\n";
        }
    } else {
        std::cout << "Parse failed\n";
    }

    if (f != l)
        std::cout << "Remaining unparsed: " << quoted(std::string(f, l)) << "\n";
}

Printing

Parsed:
0, "01", "2.3", "ABC"
2
0, "456.2", "890"

If you don't care about efficiency a lot: http://coliru.stacked-crooked.com/a/f336aad934f968f8 - when you care about compilation-time: http://coliru.stacked-crooked.com/a/c064427d30cf8d83 — sehe, Jan 30 '23 at 16:52
(note that the latter employs `x3::seek[]` which can be emulated in `qi` - see https://www.boost.org/doc/libs/1_81_0/libs/spirit/repository/example/qi/seek.cpp ) — sehe, Jan 30 '23 at 16:58
Hello @sehe, first of all, thank you so much for your prompt reply with a detailed explanation and working example! By the way, it is worth saying, that I wrote my parser based on on your other example: https://stackoverflow.com/questions/42811884/boostspirit-struggle-with-parsing-a-string . So this is not the first time when you help me! I hope that I understood the underlying idea. To skip unexpected data, it should also be described in terms of grammar and tracked as an alternative input using "|". — Gluttton, Jan 30 '23 at 20:25
Yes. A more productive way to put it, in my mind, is to just be precise about what _defines_ "trash" or "ignorable content". As soon as you define it, you can accomodate it in your grammar. — sehe, Jan 30 '23 at 21:34

Extract messages from stream and ignore data between the messages using a boost::spirit parser

1 Answers1