6

Can you help me understand the difference between the a % b parser and its expanded a >> *(b >> a) form in Boost.Spirit? Even though the reference manual states that they are equivalent,

The list operator, a % b, is a binary operator that matches a list of one or more repetitions of a separated by occurrences of b. This is equivalent to a >> *(b >> a).

the following program produces different results depending on which is used:

#include <iostream>
#include <string>
#include <vector>

#include <boost/fusion/include/adapt_struct.hpp>
#include <boost/spirit/include/qi.hpp>

struct Record {
  int id;
  std::vector<int> values;
};

BOOST_FUSION_ADAPT_STRUCT(Record,
  (int, id)
  (std::vector<int>, values)
)

int main() {
  namespace qi = boost::spirit::qi;

  const auto str = std::string{"1: 2, 3, 4"};

  const auto rule1 = qi::int_ >> ':' >> (qi::int_ % ',')                 >> qi::eoi;
  const auto rule2 = qi::int_ >> ':' >> (qi::int_ >> *(',' >> qi::int_)) >> qi::eoi;

  Record record1;
  if (qi::phrase_parse(str.begin(), str.end(), rule1, qi::space, record1)) {
    std::cout << record1.id << ": ";
    for (const auto& value : record1.values) { std::cout << value << ", "; }
    std::cout << '\n';
  } else {
    std::cerr << "syntax error\n";
  }

  Record record2;
  if (qi::phrase_parse(str.begin(), str.end(), rule2, qi::space, record2)) {
    std::cout << record2.id << ": ";
    for (const auto& value : record2.values) { std::cout << value << ", "; }
    std::cout << '\n';
  } else {
    std::cerr << "syntax error\n";
  }
}

Live on Coliru

1: 2, 3, 4, 
1: 2, 

rule1 and rule2 are different only in that rule1 uses the list operator ((qi::int_ % ',')) and rule2 uses its expanded form ((qi::int_ >> *(',' >> qi::int_))). However, rule1 produced 1: 2, 3, 4, (as expected) and rule2 produced 1: 2,. I cannot understand the result of rule2: 1) why is it different from that of rule1 and 2) why were 3 and 4 not included in record2.values even though phrase_parse returned true somehow?

  • 2
    Someone who has the pleasure of remembering the Spirit terminology can explain it better, but they look equivalent in what they match, but not in the type they accept for storing the values. The second appears to work with `(int, int, vector)` rather than `(int, vector)`. I'm guessing `vector` is compatible with a single `int_` and then the repeated `int_`s (within the `*(…)`) are ignored when storing values. – chris Nov 20 '15 at 00:26
  • 1
    Yup that's it, see my answer @chris. There's a _big_ error in the code though, so there's that (anything could have happened) – sehe Nov 20 '15 at 00:29
  • @sehe, Thanks, I'm interested in Spirit, but I haven't had a chance to use it for something yet. The CppCon X3 talk was pretty cool. – chris Nov 20 '15 at 00:30
  • 1
    @chris That triggered me to do the corresponding test cases in Spirit X3. As expected, the situation is a whole lot better on the whole (no UB, no silent attribute propagation failures, not to mention vastly improved compile times :)). See my [second answer](https://stackoverflow.com/a/33817135/85371) – sehe Nov 20 '15 at 00:52
  • 1
    Another difference is that if `b` has an attribute (not the case in your example), it is ignored in `a%b` but not in `a >>*(b >> a)`. `a >> *(omit[b] >> a)` would be closer (but with the same problems shown in sehe's answer). – llonesmiz Nov 20 '15 at 05:47

2 Answers2

10

Update X3 version added

First off, you fallen into a deep trap here:

Qi rules don't work with auto. Use qi::copy or just used qi::rule<>. Your program has undefined behaviour and indeed it crashed for me (valgrind pointed out where the dangling references originated).

So, first off:

const auto rule = qi::copy(qi::int_ >> ':' >> (qi::int_ % ',')                 >> qi::eoi); 

Now, when you delete the redundancy in the program, you get:

Reproducing the problem

Live On Coliru

int main() {
    test(qi::copy(qi::int_ >> ':' >> (qi::int_ % ',')));
    test(qi::copy(qi::int_ >> ':' >> (qi::int_ >> *(',' >> qi::int_))));
}

Printing

1: 2, 3, 4, 
1: 2, 

The cause and the fix

What happened to 3, 4 which was successfully parsed?

Well, the attribute propagation rules indicate that qi::int_ >> *(',' >> qi::int_) exposes a tuple<int, vector<int> >. In a bid to magically DoTheRightThing(TM) Spirit accidentally misfires and "assigngs" the int into the attribute reference, ignoring the remaining vector<int>.

If you want to make container attributes parse as "an atomic group", use qi::as<>:

test(qi::copy(qi::int_ >> ':' >> qi::as<Record::values_t>() [ qi::int_ >> *(',' >> qi::int_)]));

Here as<> acts as a barrier for the attribute compatibility heuristics and the grammar knows what you meant:

Live On Coliru

#include <iostream>
#include <string>
#include <vector>

#include <boost/fusion/include/adapt_struct.hpp>
#include <boost/spirit/include/qi.hpp>

struct Record {
  int id;
  using values_t = std::vector<int>;
  values_t values;
};

BOOST_FUSION_ADAPT_STRUCT(Record, id, values)

namespace qi = boost::spirit::qi;

template <typename T>
void test(T const& rule) {
    const std::string str = "1: 2, 3, 4";

    Record record;

    if (qi::phrase_parse(str.begin(), str.end(), rule >> qi::eoi, qi::space, record)) {
        std::cout << record.id << ": ";
        for (const auto& value : record.values) { std::cout << value << ", "; }
        std::cout << '\n';
    } else {
        std::cerr << "syntax error\n";
    }
}

int main() {
    test(qi::copy(qi::int_ >> ':' >> (qi::int_ % ',')));
    test(qi::copy(qi::int_ >> ':' >> (qi::int_ >> *(',' >> qi::int_))));
    test(qi::copy(qi::int_ >> ':' >> qi::as<Record::values_t>() [ qi::int_ >> *(',' >> qi::int_)]));
}

Prints

1: 2, 3, 4, 
1: 2, 
1: 2, 3, 4, 
sehe
  • 374,641
  • 47
  • 450
  • 633
10

Because it's time to get people started with X3 (the new version of Spirit), and because I like to challenge msyelf to do the corresponding tasks in Spirit X3, here is the Spirit X3 version.

There's no problem with auto in X3.

The "broken" case also behaves much better, triggering this static assertion:

    // If you got an error here, then you are trying to pass
    // a fusion sequence with the wrong number of elements
    // as that expected by the (sequence) parser.
    static_assert(
        fusion::result_of::size<Attribute>::value == (l_size + r_size)
      , "Attribute does not have the expected size."
    );

That's nice, right?

The workaround seems a bit less readable:

test(int_ >> ':' >> (rule<struct _, Record::values_t>{} = (int_ >> *(',' >> int_))));

But it would be trivial to write your own as<> "directive" (or just a function), if you wanted:

namespace {
    template <typename T>
    struct as_type {
        template <typename Expr>
            auto operator[](Expr&& expr) const {
                return x3::rule<struct _, T>{"as"} = x3::as_parser(std::forward<Expr>(expr));
            }
    };

    template <typename T> static const as_type<T> as = {};
}

DEMO

Live On Coliru

#include <iostream>
#include <string>
#include <vector>

#include <boost/fusion/adapted/std_tuple.hpp>
#include <boost/spirit/home/x3.hpp>

struct Record {
    int id;
    using values_t = std::vector<int>;
    values_t values;
};

namespace x3 = boost::spirit::x3;

template <typename T>
void test(T const& rule) {
    const std::string str = "1: 2, 3, 4";

    Record record;

    auto attr = std::tie(record.id, record.values);

    if (x3::phrase_parse(str.begin(), str.end(), rule >> x3::eoi, x3::space, attr)) {
        std::cout << record.id << ": ";
        for (const auto& value : record.values) { std::cout << value << ", "; }
        std::cout << '\n';
    } else {
        std::cerr << "syntax error\n";
    }
}

namespace {
    template <typename T>
    struct as_type {
        template <typename Expr>
            auto operator[](Expr&& expr) const {
                return x3::rule<struct _, T>{"as"} = x3::as_parser(std::forward<Expr>(expr));
            }
    };

    template <typename T> static const as_type<T> as = {};
}

int main() {
    using namespace x3;
    test(int_ >> ':' >> (int_ % ','));
    //test(int_ >> ':' >> (int_ >> *(',' >> int_))); // COMPILER asserts "Attribute does not have the expected size."

    // "clumsy" x3 style workaround
    test(int_ >> ':' >> (rule<struct _, Record::values_t>{} = (int_ >> *(',' >> int_))));

    // using an ad-hoc `as<>` implementation:
    test(int_ >> ':' >> as<Record::values_t>[int_ >> *(',' >> int_)]);
}

Prints

1: 2, 3, 4, 
1: 2, 3, 4, 
1: 2, 3, 4, 
sehe
  • 374,641
  • 47
  • 450
  • 633