Is Boost skip parser the right approach?

Question

After some delay I'm now again trying to parse some ASCII text file surrounded by some binary characters.

Parsing text file with binary envelope using boost Spririt

However I'm now struggling if a skip parser is the right approach?

The grammar of the file (it's a JEDEC file) is quite simple:

Each data field in the file starts with a single letter and ends with an asterisk. The data field can contain spaces and carriage return. After the asterisk spaces and carriage return might follow too before the next field identifier.

This is what I used to start building a parser for such a file:

phrase_parse(first, last, 
             // First char in File
             char_('\x02') >>

             // Data field
             *((print[cout << _1] | graph[cout << _1]) - char_('*')) >>

             // End of data followed by 4 digit hexnumber. How to limit?
             char_('\x03') >> *xdigit,

             // Skip asterisks
             char_('*') );

Unfortunately I don't get any output from this one. Does someone have an idea what might be wrong?

Sample file:

<STX>
JEDEC file generated by John Doe*
DM SIGNETICS(PHILIPS)*
DD GAL16R8*
QP20*
QV0*
G0*F0*
L00000 1110101111100110111101101110111100111111*
CDEAD*
<ETX>BEEF

and this is what I want to achive:

Start: JEDEC file generated by John Doe
D: M SIGNETICS(PHILIPS)
D: D GAL16R8
Q: P20
Q: V0
G: 0
F: 0
L: 00000 1110101111100110111101101110111100111111
C: DEAD
End: BEEF

"The data field can contain spaces and carriage return". Well. Tell us whether that is significant or should be skipped. Come on, you're asking us to tell you whether skipping is a good idea. You should know what you want to achieve... — sehe, Mar 27 '15 at 12:21
Ok. I want to skip the asterisk so I guess it's the right approach. But given that my approach doesn't output anything I'm insecure if there might by a better approach to that problem? — fhw72, Mar 27 '15 at 12:24
You haven't described "the problem". Or even "the goal". And, you don't want to skip the asterisk (it's important structural information). Skipping != not capturing — sehe, Mar 27 '15 at 12:25
You should give a sample of the input, and the goal/expected output. — sehe, Mar 27 '15 at 12:26

score 2 · Accepted Answer · edited May 23 '17 at 12:06

I would suggest you want to use a skipper at the toplevel rule only. And use it to skip the insignificant whitespace.

You don't use a skipper for the asterisks because you do not want to ignore them. If they're ignored, your rules cannot act upon them.

Furthermore the inner rules should not use the space skipper for the simple reason that whitespace and linefeeds are valid field data in JEDEC.

So, the upshot of all this would be:

value = *(ascii::char_("\x20-\x7e\r\n") - '*') >> '*';
field = ascii::graph >> value;
start = STX >> value >> *field >> ETX >> xmit_checksum;

Where the rules would be declared with the respective skippers:

qi::uint_parser<uint16_t, 16, 4, 4>           xmit_checksum;
qi::rule<It, ascii::space_type> start;
qi::rule<It>             field, value; // no skippers - they are lexemes

Take-away: Split your grammar up in rules. Be happier for it.

Processing the results

Your sample needlessly mixes responsibilities for parsing and "printing". I'd suggest not using semantic actions here (Boost Spirit: "Semantic actions are evil"?).

Instead, declare appropriate attribute types:

struct JEDEC {
    std::string caption;
    struct field { 
        char id;
        std::string value;
    };
    std::vector<field> fields;
    uint16_t checksum;
};

And declare them in your rules:

qi::rule<It, ast::JEDEC(), ascii::space_type> start;
qi::rule<It, ast::JEDEC::field()>             field;
qi::rule<It, std::string()>                   value;
qi::uint_parser<uint16_t, 16, 4, 4>           xmit_checksum;

Now, nothing needs to be changed in your grammar, and you can print the desired output with:

inline static std::ostream& operator<<(std::ostream& os, JEDEC const& jedec) {
    os << "Start: " << jedec.caption << "\n";
    for(auto& f : jedec.fields)
        os << f.id << ": " << f.value << "\n";

    auto saved = os.rdstate();
    os << "End: " << std::hex << std::setw(4) << std::setfill('0') << jedec.checksum;
    os.setstate(saved);

    return os;
}

LIVE DEMO

Here's a demo program that ties it together using the sample input from your question:

Live On Coliru

//#define BOOST_SPIRIT_DEBUG
#include <boost/fusion/adapted/struct.hpp>
#include <boost/spirit/include/qi.hpp>
#include <iomanip>

namespace qi = boost::spirit::qi;
namespace ascii = qi::ascii;

namespace ast {
    struct JEDEC {
        std::string caption;
        struct field { 
            char id;
            std::string value;
        };
        std::vector<field> fields;
        uint16_t checksum;
    };

    inline static std::ostream& operator<<(std::ostream& os, JEDEC const& jedec) {
        os << "Start: " << jedec.caption << "\n";
        for(auto& f : jedec.fields)
            os << f.id << ": " << f.value << "\n";

        auto saved = os.rdstate();
        os << "End: " << std::hex << std::setw(4) << std::setfill('0') << std::uppercase << jedec.checksum;
        os.setstate(saved);

        return os;
    }
}

BOOST_FUSION_ADAPT_STRUCT(ast::JEDEC::field,
        (char, id)(std::string, value))
BOOST_FUSION_ADAPT_STRUCT(ast::JEDEC,
        (std::string, caption)
        (std::vector<ast::JEDEC::field>, fields)
        (uint16_t, checksum))

template <typename It> 
struct JedecGrammar : qi::grammar<It, ast::JEDEC(), ascii::space_type>
{
    JedecGrammar() : JedecGrammar::base_type(start) {
        const char STX = '\x02';
        const char ETX = '\x03';

        value = *(ascii::char_("\x20-\x7e\r\n") - '*') >> '*';
        field = ascii::graph >> value;
        start = STX >> value >> *field >> ETX >> xmit_checksum; 

        BOOST_SPIRIT_DEBUG_NODES((start)(field)(value))
    }
  private:
    qi::rule<It, ast::JEDEC(), ascii::space_type> start;
    qi::rule<It, ast::JEDEC::field()>             field;
    qi::rule<It, std::string()>                   value;
    qi::uint_parser<uint16_t, 16, 4, 4>           xmit_checksum;
};

int main() {
    typedef boost::spirit::istream_iterator It;
    It first(std::cin>>std::noskipws), last;

    JedecGrammar<It> g;

    ast::JEDEC jedec;
    bool ok = phrase_parse(first, last, g, ascii::space, jedec);

    if (ok)
    {
        std::cout << "Parse success\n";
        std::cout << jedec;
    }
    else
        std::cout << "Parse failed\n";

    if (first != last)
        std::cout << "Remaining input unparsed: '" << std::string(first, last) << "'\n";
}

Output:

Start: JEDEC file generated by John Doe
D: M SIGNETICS(PHILIPS)
D: D GAL16R8
Q: P20
Q: V0
G: 0
F: 0
L: 00000 1110101111100110111101101110111100111111
C: DEAD
End: BEEF

Take-away: See your dentist twice a year.

Wow! At first: Thanks alot. I'm more than amazed about the extent of your help. This one is really helpful for me as your explanations are wellfounded. I'll now dig deeper into it and hopefully get a better understanding of the general usage of the boost parser framework with this example. — fhw72, Mar 27 '15 at 13:33

Is Boost skip parser the right approach?

1 Answers1

Processing the results

LIVE DEMO

Linked