1

In case of alternative, where in one path there is nothing to match for an optional, how to process ?

Consider this mvce. This is not my real example but the minimal example I could imagine to express what I am intending to do :

Parsing into a foo AST which has 3 fields. The second one is optional and may be nullopt. The 3rd field int has a parsing validation rule that depends on the presence or not of this second field.

In this example if there is a double, then the int must be even , otherwise it must be odd.

Valid cases 

foobar:3.14;4 
foobar;4 
foobar|5 

Invalid cases
foobar:3.14;5 
foobar;5 
foobar|4 
foobar:3.14|4 

#include <iostream>
#include <string>
#include <optional>

#include <boost/fusion/include/adapt_struct.hpp>
#include <boost/spirit/home/x3.hpp>

namespace x3 = boost::spirit::x3;

namespace ast{
    struct foo {
        std::string string_value;
        std::optional<double> optional_double_value;
        int int_value;
    };
    
}

template <typename T>
std::ostream& operator<<(std::ostream& os, const std::optional<T> & opt)
{
    return opt ? os << opt.value() : os << "nullopt";
};

std::ostream& operator<<(std::ostream& os, const ast::foo & foo)
{
    return os << "string_value :"<<  foo.string_value << " optional_double : " << foo.optional_double_value << " int : " << foo.int_value;
};


BOOST_FUSION_ADAPT_STRUCT(ast::foo, string_value, optional_double_value,int_value)

namespace parser {

 
    const auto even_int = x3::rule<struct even_int, int> {"even int"}
    = x3::int_ [ ([](auto& ctx) {
        auto& attr = x3::_attr(ctx);
        auto& val  = x3::_val(ctx);
        val = attr;
        x3::_pass(ctx) = x3::_val(ctx) %2 == 0;
    }) ];
    

    const auto odd_int = x3::rule<struct even_int, int> {"odd int"}
    = x3::int_ [ ([](auto& ctx) {
        auto& attr = x3::_attr(ctx);
        auto& val  = x3::_val(ctx);
        val = attr;
        x3::_pass(ctx) = x3::_val(ctx) %2 == 1;
    }) ];
    
    const auto foo =  ( *x3::alpha  >> -(':' >> x3::double_) >> ';' >> even_int )
                       ;//|  (  *x3::alpha >>  '|' >> odd_int ) ;
                 
}


template <typename Parser, typename Attr>
static inline bool parse(std::string_view in, Parser const& p, Attr& result)
{
    return x3::parse(in.begin(), in.end(), p, result);
}

int main()
{
    for (auto& input : { "foobar:3.14;4", "foobar;4","foobar|5"}) {
        ast::foo result;
        if (!parse(input, parser::foo, result))
            std::cout << "parsing " << input << " failed" << std::endl;
        else
            std::cout << "parsing " << input << " success : " << result <<  std::endl;
    }
}

Uncommenting the second alternative for odd int raise

/usr/local/include/boost/spirit/home/x3/operator/detail/sequence.hpp:144:25: error: static assertion failed: Size of the passed attribute is bigger than expected.

  144 |             actual_size <= expected_size

which I understand because hum, there is two "tokens" where there should be 3. How to handle that ?

Bonus question:

Why

 auto even_int = x3::rule<struct even_int, int> {"even int"}
    = ...

can not simply be defined with

 auto even_int = ...;

(failing to compile in this case)

sandwood
  • 2,038
  • 20
  • 38

2 Answers2

1

All the symptoms (including the bonus question) are symptoms of imperfect attribute propagation machinery.

Automatic attribute propagation is very nice, but there will continue to be cases where you have to help the system.

Looking at your desired rule and outcomes:

const auto foo
    = *x3::alpha >> -(':' >> x3::double_) >> ';' >> even_int
    | *x3::alpha >> '|' >> odd_int
    ;

I conclude that you want the same rule, just without the optional double for even ordinals and using a different delimiter for even vs. odd.

I would try to stay closer to the declarative nature of parser expressions and try to make the verdict more highlevel. E.g.

Live On Coliru

#include <boost/fusion/include/adapt_struct.hpp>
#include <boost/spirit/home/x3.hpp>
#include <iomanip>
#include <optional>

namespace ast {
    enum class discriminator { even, odd };
    struct foo {
        std::string           s;
        std::optional<double> od;
        discriminator         ind;
        int                   id;

        bool is_valid() const {
            bool is_even = 0 == (id % 2);
            switch (ind) {
              case discriminator::even: return is_even;
              case discriminator::odd: return not(is_even or od.has_value());
              default: return false;
            }
        }
    };

    std::ostream& operator<<(std::ostream& os, const foo& foo)
    {
        os << std::quoted(foo.s); //
        if (foo.od.has_value())
            os << "(" << *foo.od << ")";
        return os << " " << foo.id //
                  << " (" << (foo.is_valid() ? "valid" : "INVALID") << ")";
    }
} // namespace ast

BOOST_FUSION_ADAPT_STRUCT(ast::foo, s, od, ind, id)

namespace parser {
    namespace x3 = boost::spirit::x3;

    static const auto indicator_ = [] {
        x3::symbols<ast::discriminator> sym;
        sym.add                        //
            (";", ast::discriminator::even) //
            ("|", ast::discriminator::odd);
        return sym;
    }();

    static const auto foo //
        = +x3::alpha >> -(':' >> x3::double_) >> indicator_ >> x3::int_;
}

int main()
{
    for (std::string const input : {
             "foobar:3.14;4",
             "foobar;4",
             "foobar|5",

             // Invalid cases
             "foobar:3.14;5",
             "foobar;5",
             "foobar|4",
             "foobar:3.14|4",
         }) //
    {
        ast::foo result;
        if (parse(input.begin(), input.end(), parser::foo, result))
            std::cout << std::quoted(input) << " -> " << result << std::endl;
        else
            std::cout << std::quoted(input) << " Syntax error" << std::endl;
    }
}

Prints

"foobar:3.14;4" -> "foobar"(3.14) 4 (valid)
"foobar;4" -> "foobar" 4 (valid)
"foobar|5" -> "foobar" 5 (valid)
"foobar:3.14;5" -> "foobar"(3.14) 5 (INVALID)
"foobar;5" -> "foobar" 5 (INVALID)
"foobar|4" -> "foobar" 4 (INVALID)
"foobar:3.14|4" -> "foobar"(3.14) 4 (INVALID)

Note that you could view this approach as a separation of syntax and semantics.

Alternatives/Improving From Here

Of course you can now write the parse as

return parse(input.begin(), input.end(), parser::foo, result)
    && result.is_valid();

Or if you insist you can encapsulate that check in a semantic action like before:

auto is_valid_ = [](auto& ctx) {
    _pass(ctx) = _val(ctx).is_valid();
};

static const auto foo                              //
    = x3::rule<struct foo_, ast::foo, true>{"foo"} //
    = (+x3::alpha >> -(':' >> x3::double_) >> indicator_ >>
       x3::int_)[is_valid_];

Now the output morphs into:

Live On Coliru

"foobar:3.14;4" -> "foobar"(3.14) 4 (valid)
"foobar;4" -> "foobar" 4 (valid)
"foobar|5" -> "foobar" 5 (valid)
"foobar:3.14;5" Syntax error
"foobar;5" Syntax error
"foobar|4" Syntax error
"foobar:3.14|4" Syntax error

Without Fusion

Now, the above explicitly still used fusion sequence adaptation with automatic attribute propagation. However, since you're deep into semantic actions anyways¹, you can of course do the rest of the work there:

Live On Coliru

#include <boost/spirit/home/x3.hpp>
#include <iomanip>
#include <optional>

namespace ast {
    struct foo {
        std::string           s;
        std::optional<double> od;
        int                   id;
    };

    std::ostream& operator<<(std::ostream& os, const foo& foo)
    {
        os << std::quoted(foo.s); //
        if (foo.od.has_value())
            os << "(" << *foo.od << ")";
        return os << " " << foo.id;
    }
} // namespace ast

namespace parser {
    namespace x3 = boost::spirit::x3;
    enum class discriminator { even, odd };

    static const auto indicator_ = [] {
        x3::symbols<discriminator> sym;
        sym.add                        //
            (";", discriminator::even) //
            ("|", discriminator::odd);
        return sym;
    }();

    auto make_foo = [](auto& ctx) {
        using boost::fusion::at_c;
        auto& attr = _attr(ctx);
        auto& s    = at_c<0>(attr); // where are
        auto& od   = at_c<1>(attr); // structured bindings
        auto& ind  = at_c<2>(attr); // when you
        auto& id   = at_c<3>(attr); // need them? :|

        bool  is_even = 0 == (id % 2);

        if (ind == discriminator::even)
            _pass(ctx) = is_even;
        else
            _pass(ctx) = not(is_even or od.has_value());

        _val(ctx) = ast::foo{
            std::move(s),
            od.has_value() ? std::make_optional(*od) : std::nullopt, id};
    };

    static const auto foo = x3::rule<struct foo_, ast::foo> {}
        = (+x3::alpha >> -(':' >> x3::double_) >> indicator_ >>
           x3::int_)[make_foo];
} // namespace parser

int main()
{
    for (std::string const input : {
             "foobar:3.14;4",
             "foobar;4",
             "foobar|5",

             // Invalid cases
             "foobar:3.14;5",
             "foobar;5",
             "foobar|4",
             "foobar:3.14|4",
         }) //
    {
        ast::foo result;

        if (parse(input.begin(), input.end(), parser::foo, result))
            std::cout << std::quoted(input) << " -> " << result << std::endl;
        else
            std::cout << std::quoted(input) << " Syntax error" << std::endl;
    }
}

This has pros and cons. The pros would be

  • reduced compile time
  • discriminator is now private to the parser

Cons:

  • you're doing manual propagation (like boost::optional->std::optional which is clumsy)
  • semantic actions¹

Hybrid

As you can probably tell, I'm not fond of the hand-writing-attribute-propagation genuflection. If you must hide the ind field from the ast, perhaps make it so:

Live On Coliru

#include <boost/fusion/adapted/struct.hpp>
#include <boost/spirit/home/x3.hpp>
#include <iomanip>
#include <optional>

namespace ast {
    struct foo {
        std::string           s;
        std::optional<double> od;
        int                   id;
    };

    std::ostream& operator<<(std::ostream& os, const foo& foo)
    {
        os << std::quoted(foo.s); //
        if (foo.od.has_value())
            os << "(" << *foo.od << ")";
        return os << " " << foo.id;
    }
} // namespace ast

namespace parser {
    namespace x3 = boost::spirit::x3;
    enum class discriminator { even, odd };

    struct p_foo : ast::foo {
        discriminator ind;

        struct semantic_error : std::runtime_error {
            using std::runtime_error::runtime_error;
        };

        void check_semantics() const {
            bool is_even = 0 == (id % 2);
            switch (ind) {
              case discriminator::even:
                  if (!is_even)
                      throw semantic_error("id should be even");
                  break;
              case discriminator::odd:
                  if (is_even)
                      throw semantic_error("id should be odd");
                  if (od.has_value())
                      throw semantic_error("illegal double at odd foo");
                  break;
              }
        }
    };
}

BOOST_FUSION_ADAPT_STRUCT(parser::p_foo, s, od, ind, id)

namespace parser {
    static const auto indicator_ = [] {
        x3::symbols<discriminator> sym;
        sym.add                        //
            (";", discriminator::even) //
            ("|", discriminator::odd);
        return sym;
    }();

    static const auto raw_foo      //
        = x3::rule<p_foo, p_foo>{} //
        = +x3::alpha >> -(':' >> x3::double_) >> indicator_ >> x3::int_;

    auto checked_ = [](auto& ctx) {
        auto& _pf = _attr(ctx);
        _pf.check_semantics();
        _val(ctx) = std::move(_pf);
    };
    static const auto foo                   //
        = x3::rule<struct foo_, ast::foo>{} //
        = raw_foo[checked_];
} // namespace parser

int main()
{
    for (std::string const input : {
             "foobar:3.14;4",
             "foobar;4",
             "foobar|5",

             // Invalid cases
             "foobar:3.14;5",
             "foobar;5",
             "foobar|4",
             "foobar:3.14|4",
             "foobar:3.14|5",
         }) //
    {
        ast::foo result;

        try {
        if (parse(input.begin(), input.end(), parser::foo, result))
            std::cout << std::quoted(input) << " -> " << result << std::endl;
        else
            std::cout << std::quoted(input) << " Syntax error" << std::endl;
        } catch(std::exception const& e) {
            std::cout << std::quoted(input) << " Semantic error: " << e.what() << std::endl;
        }
    }
}

Printing

"foobar:3.14;4" -> "foobar"(3.14) 4
"foobar;4" -> "foobar" 4
"foobar|5" -> "foobar" 5
"foobar:3.14;5" Semantic error: id should be even
"foobar;5" Semantic error: id should be even
"foobar|4" Semantic error: id should be odd
"foobar:3.14|4" Semantic error: id should be odd
"foobar:3.14|5" Semantic error: illegal double at odd foo

Note the richer diagnostic information.


Post Scriptum: Minimal Change

Later, re-reading your question I suddenly realized there was asmaller change that would help your grammar. I introduced my answer with the words:

Automatic attribute propagation is very nice, but there will continue to be cases where you have to help the system

Here you can help it by making both branches have the same structure. So instead of

const auto foo
    = *x3::alpha >> -(':' >> x3::double_) >> ';' >> even_int
    | *x3::alpha >> '|' >> odd_int
    ;

You could manually insert an empty optional double in the middle of the odd branch:

const auto foo                                               //
    = +x3::alpha >> -(':' >> x3::double_) >> ';' >> even_int //
    | +x3::alpha >> x3::attr(ast::optdbl{}) >> '|' >> odd_int;

(where optdbl is an alias for std::optional<double> for style).

Now, if you refactor those odd_int/even_int rules a bit, I'd say this appraoch has some appear over the other options above:

Live On Coliru

#include <boost/fusion/include/adapt_struct.hpp>
#include <boost/spirit/home/x3.hpp>
#include <iomanip>
#include <optional>

namespace ast{
    using optdbl = std::optional<double>;

    struct foo {
        std::string s;
        optdbl      od;
        int         id;
    };

    std::ostream& operator<<(std::ostream& os, const foo& foo)
    {
        os << std::quoted(foo.s); //
        if (foo.od.has_value())
            os << "(" << *foo.od << ")";
        return os << " " << foo.id;
    }
}

BOOST_FUSION_ADAPT_STRUCT(ast::foo, s, od,id)

namespace parser {
    namespace x3 = boost::spirit::x3;

    static auto mod2check(int remainder) {
        return [=](auto& ctx) { //
            _pass(ctx) = _val(ctx) % 2 == remainder;
        };
    }

    static auto mod2int(int remainder) {
        return x3::rule<struct _, int, true>{} = x3::int_[mod2check(remainder)];
    }

    const auto foo                                           //
        = +x3::alpha >>                                      //
        (-(':' >> x3::double_) | x3::attr(ast::optdbl{})) >> //
        (';' >> mod2int(0) | '|' >> mod2int(1))              //
        ;
} // namespace parser

int main()
{
    for (std::string const input : {
             "foobar:3.14;4",
             "foobar;4",
             "foobar|5",

             // Invalid cases
             "foobar:3.14;5",
             "foobar;5",
             "foobar|4",
             "foobar:3.14|4",
         }) //
    {
        ast::foo result;
        if (parse(input.begin(), input.end(), parser::foo, result))
            std::cout << std::quoted(input) << " -> " << result << std::endl;
        else
            std::cout << std::quoted(input) << " Syntax error" << std::endl;
    }
}

¹ Boost Spirit: "Semantic actions are evil"?

sehe
  • 374,641
  • 47
  • 450
  • 633
  • I thought of a smaller fix that probably will enlighten you the most. Added at the end under "Post Scriptum" – sehe Sep 20 '21 at 22:17
  • Oh, darnit. That runs into rollback issues with the string.(["foobarfoobar"](http://coliru.stacked-crooked.com/a/1e8cab539d086b79)) Maybe better to pick an approach without the alternative branch after all. – sehe Sep 20 '21 at 22:35
  • Falling on the infamous not automatic rollback issue on alternative parser when a branch fail... I would definitely prefer the post-scriptum last solution because my real life example is really : I have a token A , then maybe B, then C or A and C' where the capital letter are C++ type and the ' is only the parsing rule that differ (between C and C'). Is there a way to "factorise" the initial A parsing which is common and would avoid the rollback issue ? – sandwood Sep 21 '21 at 19:22
  • @sandwood I think in most senses you're stuck with semantic actions then. UNLESS you can cheat your way out of that with `x3::raw[]` (only works as long as the attribute _exactly_ matches the source sequence, even when literals or skippers are involved.) Mmm. not too sure that fixes it. Lemme try – sehe Sep 21 '21 at 19:24
  • 1
    Nah. [`x3::raw[]` just synthesizes the same value, the propagation still appends.](http://coliru.stacked-crooked.com/a/792a4b4062c14da8) Okay. Anything against just parsing into a "superset sequence" always and have some validation on top? At some point you have to be practical. I don't think Spirit is a framework to guide your parsers. You should probably aim for the sweet spot, or use something else (e.g. postprocessing the AST is a viable option, a bit like my "Hybrid" option, where that was actually still inside the rules) – sehe Sep 21 '21 at 19:28
  • Oh look: this particular example you can: http://coliru.stacked-crooked.com/a/1477836ef46363e0 Whew. Simple, isn't it (hahaha). BUT now the optional double is accepted on both branches. Bah. – sehe Sep 21 '21 at 19:33
  • 1
    Sooo. wear your goggles: http://coliru.stacked-crooked.com/a/2fe4139f77bff23e is showing the best I can currently do with `clumsy_manual_propagate` and factored branches. Mind you, I still cheated a bit by changing `std::optional` to `boost::optional`. I maintain that the second version above ("Without Fusion") is strictly that but better. (And it _does_ use `std::optional`) – sehe Sep 21 '21 at 19:52
  • Thanks for all your effort ! Your last one is IMHO the best solution because : 1/ it stays closer to the ABNF. (easier to maintain) 2/ the propagation stuff is not "so clumsy". With some better name (nested_tuple_to_flat_tuple) and a comments : (A ,(B,C)) -> (A, B,C) the purpose become understandable. 3/ Not sure why but locally with my real use case (B is a boost FUSION adapted struct) I do not get the conversion error between std::optional and boost::optional in the inner lambda -> meaning I can have a boost free (i.e. with std::optional) AST which is a pre-requisite. – sandwood Sep 21 '21 at 20:52
  • That [should not compile](http://coliru.stacked-crooked.com/a/3e5bc558020efb10). Of course you can make it [more clumsy still](http://coliru.stacked-crooked.com/a/58bccc63b5b8640f). I hear you when you say it's "not that clumsy". That's probably my fault. I worked too hard to get a semblance of elegance. What I fundamentally dislike about it is that it duplicates random bits of X3 library logic and details into your code making it hard to maintain, and potentially forgetting edge cases leading to annoying/suprising bugs. In short: it negates most of what makes Spirit worth it in my opinion – sehe Sep 21 '21 at 21:42
0

In case of alternative, where in one path there is nothing to match for an optional, how to process ?

For such cases there is attr(x) parser. It produces a copy of x every time it is 'parsed', without consuming any input.

So the answer for

how to process in case of an optional that can be nullopt in an alternative case?

is to use attr(std::nullopt), like this:

    const auto foo =  ( *x3::alpha  >> -(':' >> x3::double_) >> ';' >> even_int )
                       |  (  *x3::alpha >> x3::attr(std::nullopt) >>  '|' >> odd_int ) ;

https://godbolt.org/z/E5jM6s6vW

Nikita Kniazev
  • 3,728
  • 2
  • 16
  • 30