2

I'm trying to parse a simple text file using boost::spirit. The text file is a line delimited list of strings. I can get it to mostly work, except for when it comes to blank lines, which I would like to skip.

I've tried several approaches, but I either stop parsing at the blank line, or I get the blank line included in my results.

Is there a way to tell my grammar to skip blank lines?

code

std::ifstream ifs("one.txt");
ifs >> std::noskipws;

std::vector< std::string > people;

if (parse(
     istream_iterator(ifs),
     istream_iterator(),
     *(as_string[+print >> (eol | eoi)]),
     people))
{
  std::cout << "Size = " << people.size() << std::endl;

  for (auto person : people)
  {
     std::cout << person << std::endl;
  }
}

one.txt

Sally
Joe
Frank
Mary Ann

Bob

What I Get

Sally
Joe
Frank
Mary Ann

What I Want to Get

Sally
Joe
Frank
Mary Ann
Bob

Bonus: Can I strip leading or trailing spaces from the lines in the grammar at the same time? I need to keep the space in Mary Ann of course.

Aaron Wright
  • 328
  • 2
  • 11
  • I think http://stackoverflow.com/questions/10669287/how-to-parse-entries-followed-by-semicolon-or-newline-boostspirit can help you little bit. – Ramesh Chander Nov 13 '14 at 07:26

1 Answers1

1
if (qi::phrase_parse(
            first, last,
            -qi::as_string[qi::lexeme[+(qi::char_ - qi::eol)]] % qi::eol,
            qi::blank,
            people))

I'll refer to Boost spirit skipper issues for more background. Quick notes:

if (qi::phrase_parse(
//      ^ ----- use a skipper to parse phrases whith a skipper (`qi::blank` here)
            first, last,
            -qi::as_string[qi::lexeme[+(qi::char_ - qi::eol)]] % qi::eol,
//          |                  |      |                          ^---- 1.
//          +---- 2.           |      +---- 4.
//       5. ----v       3. ----+      
            qi::blank,
            people))
  1. match list of items separated by newlines
  2. '-' makes the item optional (ignoring blank lines)
  3. lexeme includes whitespace inside the subexpression (but it does still pre-skip, so lines with only whitespace count as empty lines; use no_skip if you don't want preskip to happen)
  4. + requires at least 1 match, so empty names are not considered a name
  5. the blank skipper skips whitespaces, but not newlines; this is because newlines are significant to your grammar. Also note that the lexeme still keeps the internal whitespace

See it Live On Coliru

UPDATE In response to the comment, the added complexity was due to skipping whitespace. If you are happy trimming whitespace after the fact, by all means, use

if (parse(first, last, - as_string[+(char_ - eol)] % eol, people))

See it Live On Coliru as well

Community
  • 1
  • 1
sehe
  • 374,641
  • 47
  • 450
  • 633
  • This is what I was afraid of. I'm trying to write up an article for my group at work about how they easy boost::spirit is to use, and that it isn't just for complicated formats. This might scare them away. Is there no way to accomplish the same thing without having to introduce skippers? – Aaron Wright Nov 13 '14 at 15:43
  • @AaronWright exactly what part is complex? Or at least, which part is more complex than your sample? I mean, your sample just did less (too little?). If the schematic with notes looks complicating, look back to the original line of code. It's not inherently much more complex than, say, EBNF IYAM – sehe Nov 13 '14 at 15:46
  • I find the idea of introducing a skipper, just to defeat the skipper in the grammar, to be more complicated than I was hoping. I suppose it is true that if I want more functionality, I'm going to need more complexity. – Aaron Wright Nov 13 '14 at 15:52
  • @AaronWright By all mean, don't introduce a skipper and just trim the whitespace. I was being Spirit-y in my solution. – sehe Nov 13 '14 at 15:59
  • 1
    Anyways, in the _spirit_ of small examples, I give you a [small HTTP response headers parsing callback](http://paste.ubuntu.com/8989134/) and ditto [query parameter parsing function](http://paste.ubuntu.com/8989173/) and a [ServiceBusConnectionStringBuilder](http://paste.ubuntu.com/8989209/). I would /not/ enjoy writing and testing these without the composable parser primitives from Spirit – sehe Nov 13 '14 at 15:59
  • 1
    I've updated the answer with the skipper-less approach (**[coliru](http://coliru.stacked-crooked.com/a/6f6d5602d00c68e4)**). I think the point I'd drive home when plugging Spirit would be **not** how "easy" it is, but how productive this power tool can be. Especially when you need genericity. – sehe Nov 13 '14 at 16:04
  • I can't argue with that. I'll remember that when writting my article. This is one of the first examples, so I wanted to build up functionality, from the simpliest grammar that'll read the simpliest file, and then add features such as blank lines, etc. Kind of ease them into it. Thanks for the help. – Aaron Wright Nov 13 '14 at 16:40