1

I am trying to create an optional parser rule. Depending on the value of the first attribute, I want to optionally emits a data.

Example, for the input:

x,2,3
y,3,4
x,5,6

If the first character is a y then the line should be discarded. Otherwise it will be processed. In this example, if the 3rd attribute is >= 4 then it is true. The synthesized attribute should be std::pair<bool, unsigned int> where the unsigned int value is the second attribute. The parser is:

using namespace qi = boost::spirit::qi;
using Data = std::pair<bool, unsigned>;
BOOST_PHOENIX_ADAPT_FUNCTION(Data, make_pair, std::make_pair, 2);

class DataParser :
    public qi::grammar<
    std::string::iterator,
    boost::spirit::char_encoding::ascii,
    boost::spirit::ascii::space_type,
    std::vector<Data>()
    >
{
    qi::rule<iterator_type, encoding_type, bool()> type;
    qi::rule<iterator_type, encoding_type, bool()> side;
    // doesn't compile: qi::rule<iterator_type, encoding_type, boost::spirit::ascii::space_type, boost::optional<Data>()> line;
    qi::rule<iterator_type, encoding_type, boost::spirit::ascii::space_type, qi::locals<bool, unsigned, bool>, Data()> line;
    qi::rule<iterator_type, encoding_type, boost::spirit::ascii::space_type, sig_type> start;

public:
    DataParser()
        : base_type(start)
    {
        using namespace qi::labels;

        type = qi::char_[_val = _1 == 'x'];
        side = qi::int_[_val = _1 >= 4];
        line %= (qi::omit[type[_a = _1]] >> ',' >> qi::omit[qi::uint_[_b = _1]] >> ',' >> qi::omit[side[_c = _1]])[if_(_a)[_val = make_pair(_c, _b)]];
        // doesn't compile: line %= (qi::omit[type[_a = _1]] >> ',' >> qi::omit[qi::uint_[_b = _1]] >> ',' >> qi::omit[side[_c = _1]])[if_(_a)[_val = make_pair(_c, _b)].else_[_val = qi::unused]];
        // doesn't compile: line %= (type >> ',' >> qi::uint_ >> ',' >> side)[if_(_1)[_val = make_pair(_3, _2)]];
        // doesn't compile: line %= (type >> ',' >> qi::uint_ >> ',' >> side)[if_(_1)[_val = make_pair(_3, _2)].else_[_val = unused]];
        start = *line;
    }
};

I get: [[false, 2], [false, 0], [true, 5]] where I want to get: [[false, 2], [true, 5]] (the second entry should be discarded).

I tried with boost::optional<Data> for the data rule and also to assign unused to _val but nothing worked.

Edit after fixing the issue with the accepted answer

The new rules are now:

using Data = std::pair<bool, unsigned>;
BOOST_PHOENIX_ADAPT_FUNCTION(Data, make_pair, std::make_pair, 2);

class DataParser :
    public qi::grammar<
        std::string::iterator,
        boost::spirit::char_encoding::ascii,
        boost::spirit::ascii::blank_type,
        std::vector<Data>()
    >
{
    using Items = boost::fusion::vector<bool, unsigned, bool>;

    qi::rule<iterator_type, encoding_type, bool()> type;
    qi::rule<iterator_type, encoding_type, bool()> side;
    qi::rule<iterator_type, encoding_type, boost::spirit::ascii::blank_type, Items()> line;
    qi::rule<iterator_type, encoding_type, boost::spirit::ascii::blank_type, sig_type> start;

public:
    DataParser()
        : base_type(start)
    {
        using namespace qi::labels;
        namespace px = boost::phoenix;

        type = qi::char_[_val = _1 == 'x'];
        side = qi::int_[_val = _1 >= 4];
        line = type >> ',' >> qi::uint_ >> ',' >> side;
        start = line[if_(_1)[px::push_back(_val, make_pair(_3, _2))]] % qi::eol;
    }
};

The key points being to use the semantic action to decide if the synthesized attribute should be added by using all attributes of the previous rule, in this case line.

Dave Savage
  • 73
  • 1
  • 5

1 Answers1

1

Okay. You use lots of power-tools. But remember, with great power comes....

In particular, qi::locals, phoenix, semantic actions: they're all complicating life so only use them as a last resort (or when they're a natural fit, which is rarely¹).

Think directly,

 start = *line;

 line = // ....

When you say

If the first character is a y then the line should be discarded. Otherwise it will be processed.

You can express this directly:

 line = !qi::lit('y') >> // ...

Alternatively, spell out what starters to accept:

 line = qi::omit[ qi::char_("xz") ] >> // ...

Done.

Straight Forward Mapping

Here I'll cheat by re-ordering the pair<unsigned, bool> so it matches the input order. Now everything works out of the box without "any" magic:

line   = !qi::lit('y') >> qi::omit[qi::alnum] >> ',' >> qi::int_ >> ',' >> side;
ignore = +(qi::char_ - qi::eol);

start = qi::skip(qi::blank) [ (line | ignore) % qi::eol ];

However it WILL result in the spurious entries as you noticed: Live On Compiler Explorer

Parsed: {(2, false), (0, false), (5, true)}

Improving

Now you could go hack around things by changing the eol to also eat subsequent lines that don't appear to contain valid data lines. However, it becomes unwieldy, and we still have the desire to flip the pair's members.

So, here's where I think an actrion could be handy:

  public:
    DataParser() : DataParser::base_type(start) {
        using namespace qi::labels;

        start  = qi::skip(qi::blank) [
              (qi::char_ >> ',' >> qi::uint_ >> ',' >> qi::int_) [
                  _pass = process(_val, _1, _2, _3) ]
            % qi::eol ];
    }

  private:
    struct process_f {
        template <typename... T>
        bool operator()(Datas& into, char id, unsigned type, int side) const {
            switch(id) {
                case 'z': case 'x':
                    into.emplace_back(side >= 4, type);
                    break;
                case 'y': // ignore
                    break;
                case 'a':
                    return false; // fail the rule
            }
            return true;
        }
    };

    boost::phoenix::function<action_f> process;

You can see, there's a nice separation of concerns now. You parse (char,int,int) and conditionally process it. That's what's keeping this relatively simple compared to your attempts.

Live Demo

Live On Compiler Explorer

#include <boost/fusion/adapted.hpp>
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/phoenix.hpp>
#include <fmt/ranges.h>
namespace qi = boost::spirit::qi;

using Data = std::pair<bool, unsigned>;
using Datas = std::vector<Data>;

template <typename It>
class DataParser : public qi::grammar<It, Datas()> {
    using Skipper = qi::blank_type;
    qi::rule<It, Datas(), Skipper> line;
    qi::rule<It, Datas()> start;

  public:
    DataParser() : DataParser::base_type(start) {
        using namespace qi::labels;

        start  = qi::skip(qi::blank) [
              (qi::char_ >> ',' >> qi::uint_ >> ',' >> qi::int_) [
                  _pass = process(_val, _1, _2, _3) ]
            % qi::eol ];
    }

  private:
    struct process_f {
        template <typename... T>
        bool operator()(Datas& into, char id, unsigned type, int side) const {
            switch(id) {
                case 'z': case 'x':
                    into.emplace_back(side >= 4, type);
                    break;
                case 'y': // ignore
                    break;
                case 'a':
                    return false; // fail the rule
            }
            return true;
        }
    };

    boost::phoenix::function<process_f> process;
};

int main() {
    using It = std::string::const_iterator;
    DataParser<It> p;

    for (std::string const input : {
            "x,2,3\ny,3,4\nx,5,6", 
            })
    {
        auto f = begin(input), l = end(input);
        Datas d;
        auto ok = qi::parse(f, l, p, d);

        if (ok) {
            fmt::print("Parsed: {}\n", d);
        } else {
            fmt::print("Parsed failed\n", d);
        }

        if (f!=l) {
            fmt::print("Remaining unparsed: '{}'\n", std::string(f,l));
        }
    }
}

Prints

Parsed: {(false, 2), (true, 5)}

¹ Boost Spirit: "Semantic actions are evil"?

sehe
  • 374,641
  • 47
  • 450
  • 633
  • Thank you for your response. I still need clarification if you don't mind. It seems that you successfully skip the entry by having `_pass` set to `false`. When I try it, the parsing stop completely and it only returns the first entry. How, in your code, is the parser able to continue parsing the rest of the input? – Dave Savage Sep 18 '20 at 21:47
  • I do not. `_pass` is actually never used here (or always true). I just included it in case you have other applications where you want to actively _fail_ the parse (e.g. if you [add input starting with `'a'`](https://godbolt.org/z/6srvaj)). – sehe Sep 18 '20 at 22:45
  • Alternatively just [ignore the whole `_pass` thing](https://godbolt.org/z/4T4Wfq) – sehe Sep 18 '20 at 22:49
  • Obviously! I see now that that you achieve the discarding by NOT having the vector `Datas` modified. So no way to implicitly having an entry discarded. Thanks. – Dave Savage Sep 18 '20 at 23:14