1

I'm currently trying to write a parser for an ASCII text file that is surrounded by a small envelope with checksum.

The basic structure of the file is: <0x02><"File payload"><0x03><16bit CRC>

and I want to extract the payload in another string to feed it to the next parser.

The parser expression I use to parse this envelope is:

qi::phrase_parse(
    first, last,
    char_('\x02') >> *print >> char_('\x02') >> *xdigit,
    space
);

The input is consumed... and I already tried to dump out the payload:

qi::phrase_parse(
    first, last,
    char_('\x02') >> *print[cout << _1] >> char_('\x02') >> *xdigit,
    space
);

But problem is that every newline, blank etc. is omitted!

Now my questions:

  1. How do I extract the content between the 0x02/0x03 (ETX/STX) bytes correctly without omitting spaces, newlines etc.

  2. And is my approach to first remove the envelope and then parse the payload good or is there another better approach I should use?

fhw72
  • 1,066
  • 1
  • 11
  • 19

1 Answers1

1

Use e.g. qi::seek/qi::confix to get you started (both part of the repository http://www.boost.org/doc/libs/1_57_0/libs/spirit/repository/doc/html/spirit_repository/qi_components/directives/confix.html).

But problem is that every newline, blank etc. is omitted!

Well, that's what a skipper does. Don't use one, or:

Use qi::raw[]

To extract the intervening text, I suggest using qi::raw. Although I'm not sure you actually want to copy it to a string (copying sounds expensive). You could do this probably when the source is a stream (or other source of input iterators).

Seminal rule:

myrule = '\x02' > raw [ *(char_ - '\x03') ] > '\x03';

You could add the checksumming:

myrule = '\x02' > raw [ *(char_ - '\x03') ] [ _a = _checksum(_1) ] > '\x03' >> qi::word(_a);

Assuming

  • qi::locals<uint16_t>
  • _checksum is a suitable Phoenix functor that takes a pair of source iterators and returns uint16_t

Of course you might prefer to keep checksumming outside the parser.

Community
  • 1
  • 1
sehe
  • 374,641
  • 47
  • 450
  • 633
  • Thanks alot! I missed the 'confix' chapter so far and it seems the best approach to me. In the end I'd prefer to directly parse the file into the data structures of the program. However: Can I do that and calculate the checksum at the same time? Calling two functors? – fhw72 Mar 19 '15 at 13:05
  • No problem. Just, do as I showed and don't forget about %= assignment to the rule – sehe Mar 19 '15 at 13:07
  • Ok... thanks. I'll try to follow your suggestion! Before I have to write the lexer I guess. Let's see how far I can get without asking dumb questions again. :-) – fhw72 Mar 19 '15 at 13:16
  • Just one last question: What is the best approach to develop the parser using boost Qi AND Spirit.Lex? Start with the lexer first or implement parsers for the tokens? – fhw72 Mar 19 '15 at 15:18
  • Well, if you're _building a parser on the tokenstream_, you better have a tokenstream :) That said, I don't generally recommen separating the Lexer out. It adds enough complexity to knock your parser over the sweet spot for Qi usage – sehe Mar 19 '15 at 15:23
  • Sorry... I don't understand your last answer. You mean: You'd recommend no lexer at all? – fhw72 Mar 19 '15 at 15:33
  • It's hard to provide a general recommendation, without knowing the grammar, but in general, once grammars get involved enough to "require" (benefit from) a lexer, I'd say Spirit is likely not the most convenient tool anymore. Of course, if you know exactly the ins and outs and the limitations, you could find the exceptions to this rule of thumb :) – sehe Mar 19 '15 at 16:13