5

In Boost::Spirit, how can I parse entries that are followed by either a semicolon or by a newline with optional semicolon?

Example input, where each entry is an int and a double:

12 1.4;
63 13.2
2423 56.4 ; 5 8.1

Here is example code that just parses entries followed by whitespace:

#include <iostream>
#include <boost/foreach.hpp>
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/support_istream_iterator.hpp>
#include <boost/fusion/include/std_pair.hpp>
namespace qi = boost::spirit::qi;

typedef std::pair<int, double> Entry;

template <typename Iterator, typename Skipper>
struct MyGrammar : qi::grammar<Iterator, std::vector<Entry>(), Skipper> {
  MyGrammar() : MyGrammar::base_type(entries) {
    entry = qi::int_ >> qi::double_;
    entries = +entry;
  }
  qi::rule<Iterator, Entry(), Skipper> entry;
  qi::rule<Iterator, std::vector<Entry>(), Skipper> entries;
};

int main() {
  typedef boost::spirit::istream_iterator It;
  std::cin.unsetf(std::ios::skipws);
  It it(std::cin), end;

  MyGrammar<It, qi::space_type> entry_grammar;
  std::vector<Entry> entries;
  if (qi::phrase_parse(it, end, entry_grammar, qi::space, entries)
      && it == end) {
    BOOST_FOREACH(Entry const& entry, entries) {
      std::cout << entry.first << " and " << entry.second << std::endl;
    }
  }
  else {
    std::cerr << "FAIL" << std::endl;
    exit(1);
  }
  return 0;
}

Now, to parse the way I want (each entry followed by semicolon or newline with optional semicolon), I replaced this:

    entries = +entry;

by this:

 entries = +(entry >> (qi::no_skip[qi::eol] || ';'));

where the boost::spirit operator || means: (a followed by optional b) or b. But gives an error if there is a space after the 1.4 in this example input:

12 1.4
63 13.2

It makes sense that the space is not matched because of the no_skip but I wasn't able to find a solution.

Frank
  • 64,140
  • 93
  • 237
  • 324

2 Answers2

6

Here's my take.

  • You might want to know about qi::blank (which is qi::space except qi::eol). This will remove the need for no_skip.
  • The core grammar becomes:

        entry = qi::int_ >> qi::double_;
        entries = entry % +qi::char_("\n;") >> qi::omit[*qi::space];
    
  • Use BOOST_SPIRIT_DEBUG to learn where parsing fails and why (e.g. backtracking)

Output:

12 and 1.4
63 and 13.2
2423 and 56.4
5 and 8.1

Full code:

//#define BOOST_SPIRIT_DEBUG
#include <iostream>
#include <boost/foreach.hpp>
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/support_istream_iterator.hpp>
#include <boost/fusion/include/std_pair.hpp>
namespace qi = boost::spirit::qi;

typedef std::pair<int, double> Entry;

template <typename Iterator, typename Skipper>
struct MyGrammar : qi::grammar<Iterator, std::vector<Entry>(), Skipper> {
    MyGrammar() : MyGrammar::base_type(entries) {
        entry = qi::int_ >> qi::double_;
        entries = 
            entry % +qi::char_("\n;")          // the data
            >> qi::omit[*qi::space] > qi::eoi; // trailing whitespace
        BOOST_SPIRIT_DEBUG_NODE(entry);
        BOOST_SPIRIT_DEBUG_NODE(entries);
    }
    qi::rule<Iterator, Entry(), Skipper> entry;
    qi::rule<Iterator, std::vector<Entry>(), Skipper> entries;
};

int main() {
    typedef boost::spirit::istream_iterator It;
    std::cin.unsetf(std::ios::skipws);
    It it(std::cin), end;

    MyGrammar<It, qi::blank_type> entry_grammar;
    std::vector<Entry> entries;
    if (qi::phrase_parse(it, end, entry_grammar, qi::blank, entries)
            && it == end) {
        BOOST_FOREACH(Entry const& entry, entries) {
            std::cout << entry.first << " and " << entry.second << std::endl;
        }
    }
    else {
        std::cerr << "FAIL" << std::endl;
        exit(1);
    }
    return 0;
}
sehe
  • 374,641
  • 47
  • 450
  • 633
  • Thanks for your answer. The problem is, I do need qi::space as the overall skipper. My `entry` is actually more complicated than in my mini example and I need to skip newlines in entries, so I can't use qi::blank as the overall skipper. – Frank May 20 '12 at 05:19
  • I hope my answer was helpful anyway, outlining: [`The % list parser`](http://www.boost.org/doc/libs/1_48_0/libs/spirit/doc/html/spirit/qi/reference/operator/list.html), `qi::omit` as well as `qi::space` (which should still be preferred with `qi::skip(qi::space)[...]` instead of `qi::no_skip[...]` because it uncomplicates your own expressions) – sehe May 20 '12 at 10:09
1

Okay, I found that this works fine:

entries = +(entry >> (qi::no_skip[*qi::lit(' ') >> qi::eol] || ';'));

So the immediate question is solved.

But it will still fail if a tab comes ofter the 1.4 in

12 1.4
63 13.2

This would be better but it won't compile:

entries = +(entry >> (qi::no_skip[*qi::space >> qi::eol] || ';'));

The error:

error: invalid static_cast from type ‘const std::pair<int, double\
>’ to type ‘int’
Frank
  • 64,140
  • 93
  • 237
  • 324
  • 1
    we meet again. You could fix that problem by using `qi::omit[]` so you don't expose the delimiters as an attribute. However, see **[my answer](http://stackoverflow.com/a/10670125/85371)** for the more typical solution – sehe May 20 '12 at 01:36
  • Thanks, qi::omit[] is very useful. So this one works well for me: `entries = +(entry >> (qi::no_skip[qi::omit[*qi::blank] >> qi::eol] || ';'));` – Frank May 20 '12 at 05:22
  • But, why I need to use `omit[]` on `blank` but not on `eol` is beyond me. One would think that `eol` would be exposed as well, as the string `\n`. – Frank May 20 '12 at 05:24
  • `qi::eol`, like `qi::lit` is a 'literal' parser, and it doesn't expose an attribute. Simple as that, see the [documentation (the 'Attributes' sections in the reference)](http://www.boost.org/doc/libs/1_49_0/libs/spirit/doc/html/spirit/qi/reference/char/char.html#spirit.qi.reference.char.char.attributes) – sehe May 20 '12 at 10:06