Boost spirit x3 - lazy parser with compile time known parsers, referring to a previously matched value

Question

Inspired from sehe's answer at Boost spirit x3 - lazy parser I tried to adapt it to one of my own problem (which is another story).
My grammar to implement has several ways to express numerical literals with bases of 2, 8, 10 and 16. I've reduced the approach mentioned above hopefully to a bearable minimum.

At AST I like to preserve the numerical presentation (integer, fractional, exp parts) as boost::iterator_range<> by use of x3::raw to evaluate it later, only base shall be of integer type. Honesty, I haven't the requirements for the future yet (I could imagine several possibilities - even evaluate it to a real/integer by the parser, but most of the time, the reality looks different.). For simplicity, I've used here std::string here

    struct number {
        unsigned    base;
        std::string literal;
    };

Since the base and numbers can have underscores embedded, I've used range-v3's views::filter() function. Another approach to handle those separated number has sehe shown at X3 parse rule doesn't compile.

The core idea is to have (I've used Qi's Nabialek trick long time ago) something like

    auto const char_set = [](auto&& char_range, char const* name) {
        return x3::rule<struct _, std::string>{ name } = x3::as_parser(
            x3::raw[ x3::char_(char_range) >> *(-lit("_") >> x3::char_(char_range)) ]);
    };
    auto const bin_charset = char_set("01", "binary charset");
    auto const oct_charset = char_set("0-7", "octal charset");
    auto const dec_charset = char_set("0-9", "decimal charset");
    auto const hex_charset = char_set("0-9a-fA-F", "hexadecimal charset");
    
    using Value = ast::number;
    using It    = std::string::const_iterator;
    using Rule  = x3::any_parser<It, Value>;
    
    x3::symbols<Rule> const based_parser({
            { 2,  as<std::string>[ bin_charset ] },
            { 8,  as<std::string>[ oct_charset ] },
            { 10, as<std::string>[ dec_charset ] },
            { 16, as<std::string>[ hex_charset ] }
        }, "based character set"
    );
    
    auto const base = x3::rule<struct _, unsigned>{ "base" } = dec_charset; // simplified
    
    auto const parser = x3::with<Rule>(Rule{}) [
        x3::lexeme[ set_lazy<Rule>[based_parser] >> '#' >> do_lazy<Rule> ]
    ];
    
    auto const grammar = x3::skip[ x3::space ]( parser >> x3:: eoi );

and use them like

    for (std::string const input : {
            "2#0101",
            "8#42",
            "10#4711",
            "1_6#DEAD_BEEF",
        })
    {
       ...
    }

Well, it doesn't compile and hence I do not know if it would work this way. I think, it's a better way than several lines of alternatives (as my old code). Further, if I study newer standards of the grammar I like to implement, the syntax has been extended with leading integer (for numeric width) and other base specifier, e.g. 'UB', 'UO' and others. This would come off-topic: How can I prepare the code for further grammar extensions (using something like eps[get<std_tag>(ctx) == x42])?

For convenience, I've put the example at coliru.

sehe · Accepted Answer · 2022-07-03T02:40:33.870

Well, it doesn't compile and hence I do not know if it would work this way.

Where to start. Let me recommend: Baby steps. X3 is not the framework to throw together a bunch of code and expect it to just compile let alone do what you want.

Some notes:

symbols key needs to be a character sequence, not any integer value
the rule type synthesizes a Value (as you declared Rule = any_parser<It, Value>). However, you "coerce" those the symbol expressions std::string using as<std::string>. That is not compatible.
if you want to also store the matched symbol, perhaps use &sym >> x3::uint_ >> '#' to handle it

Let me combine the factories:

template<typename...> struct Tag { };
template<typename T, typename P>
auto
as(P p, char const* name = "as")
{
    return x3::rule<Tag<T, P>, T>{name} = x3::as_parser(p);
}

Now you can simply write

auto const delimit_numeric_digits = [](auto&& char_range, char const* name)
{
    auto cs = x3::char_(char_range);
    return as<std::string>(x3::raw[cs >> *('_' >> +cs | cs)], name);
};
auto const bin_digits = delimit_numeric_digits("01", "binary digits");
auto const oct_digits = delimit_numeric_digits("0-7", "octal digits");
auto const dec_digits = delimit_numeric_digits("0-9", "decimal digits");
auto const hex_digits = delimit_numeric_digits("0-9a-fA-F", "hexadecimal digits");

(See how I improved on the naming, since charset really didn't cover it).

Next, fixing the symbol lookup:

using Rule = x3::any_parser<It, std::string>;

x3::symbols<Rule> const based_parser({
    {"2#", bin_digits},
    {"8#", oct_digits},
    {"10#", dec_digits},
    {"16#", hex_digits},
});

Notably, the digits only synthesize std::string, not the base. Now, use the trick outlined above to still expose the base as integer:

auto const parser                              //
    = x3::rule<struct _, Value, true>{"Value"} //
    = x3::with<Rule>(Rule{})[                  //
    x3::lexeme
        [&set_lazy<Rule>[based_parser] >> x3::uint_ >> '#' >> do_lazy<Rule>]];

Live Demo

Live On Coliru

//#define BOOST_SPIRIT_X3_DEBUG
#include <boost/spirit/home/x3.hpp>
#include <boost/fusion/adapted/struct.hpp>

#include <iostream>
#include <iomanip>

namespace x3 = boost::spirit::x3;

namespace ast {
    struct number {
        unsigned    base;
        std::string literal;
    };
}

BOOST_FUSION_ADAPT_STRUCT(ast::number, base, literal)

std::ostream&
operator<<(std::ostream& os, ast::number const n)
{
    return os << n.base << '#' << n.literal;
}

namespace Parsing {

template<typename...> struct Tag { };
template<typename T, typename P>
auto
as(P p, char const* name = "as")
{
    return x3::rule<Tag<T, P>, T>{name} = x3::as_parser(p);
}

template<typename Tag>
struct set_lazy_type
{
    template<typename P>
    auto
    operator[](P p) const
    {
        auto action = [](auto& ctx) { // set rhs parser
            x3::get<Tag>(ctx) = x3::_attr(ctx);
        };
        return p[action];
    }
};

template<typename Tag>
struct do_lazy_type : x3::parser<do_lazy_type<Tag>>
{
    using attribute_type = typename Tag::attribute_type; // TODO FIXME?

    template<typename It, typename Ctx, typename RCtx, typename Attr>
    bool
    parse(It& first, It last, Ctx& ctx, RCtx& rctx, Attr& attr) const
    {
        auto& subject = x3::get<Tag>(ctx);

        It saved = first;
        x3::skip_over(first, last, ctx);
        if(x3::as_parser(subject).parse(
               first,
               last,
               std::forward<Ctx>(ctx),
               std::forward<RCtx>(rctx),
               attr))
        {
            return true;
        } else
        {
            first = saved;
            return false;
        }
    }
};

template<typename T> static const set_lazy_type<T> set_lazy{};
template<typename T> static const do_lazy_type<T> do_lazy{};

auto const delimit_numeric_digits = [](auto&& char_range, char const* name)
{
    auto cs = x3::char_(char_range);
    return as<std::string>(x3::raw[cs >> *('_' >> +cs | cs)], name);
};
auto const bin_digits = delimit_numeric_digits("01", "binary digits");
auto const oct_digits = delimit_numeric_digits("0-7", "octal digits");
auto const dec_digits = delimit_numeric_digits("0-9", "decimal digits");
auto const hex_digits = delimit_numeric_digits("0-9a-fA-F", "hexadecimal digits");

using Value = ast::number;
using It = std::string::const_iterator;
using Rule = x3::any_parser<It, std::string>;

x3::symbols<Rule> const based_parser({
    {"2#", bin_digits},
    {"8#", oct_digits},
    {"10#", dec_digits},
    {"16#", hex_digits},
});

auto const parser                              //
    = x3::rule<struct _, Value, true>{"Value"} //
    = x3::with<Rule>(Rule{})[                  //
    x3::lexeme
        [&set_lazy<Rule>[based_parser] >> x3::uint_ >> '#' >> do_lazy<Rule>]];

auto const grammar = x3::skip(x3::space)[parser >> x3::eoi];
} // namespace Parsing

int main()
{
    for(std::string const input : {
            "2#0101",
            "8#42",
            "10#4711",
            "1_6#DEAD_BEEF",
        })
    {
        Parsing::Value attr;
        if(parse(begin(input), end(input), Parsing::grammar, attr))
        {
            std::cout << std::quoted(input) << " -> success (" << attr << ")\n";
        } else
        {
            std::cout << std::quoted(input) << " -> failed\n";
        }
    }
}

Prints

"2#0101" -> success (2#0101)
"8#42" -> success (8#42)
"10#4711" -> success (10#4711)
"1_6#DEAD_BEEF" -> failed

Your explanation are enlightening as always, thank you! I had forgotten that symbols requires a character sequence. So this approach does not work. However, I also need to be able to parse base specifiers like "001_6". Hence, the idea of taking a self-written parser class for this task - since the base is then cleaned up, then taking the actual bin/.../hex parser afterwards. This would be an approach like you showed in [Spirit X3, referring to a previously matched value](https://stackoverflow.com/questions/62143841/spirit-x3-referring-to-a-previously-matched-value/62178987#62178987). — Olx, Jul 04 '22 at 16:45

Boost spirit x3 - lazy parser with compile time known parsers, referring to a previously matched value

1 Answers1

Live Demo