Inspired from sehe's answer at
Boost spirit x3 - lazy parser
I tried to adapt it to one of my own problem (which is another story).
My grammar to implement has several ways to express numerical literals with
bases of 2, 8, 10 and 16. I've reduced the approach mentioned above hopefully
to a bearable minimum.
At AST I like to preserve the numerical presentation (integer, fractional, exp parts)
as boost::iterator_range<> by use of x3::raw
to evaluate it later, only base shall be of
integer type. Honesty, I haven't the requirements for the future yet (I could imagine
several possibilities - even evaluate it to a real/integer by the parser, but most of
the time, the reality looks different.). For simplicity, I've used here std::string
here
struct number {
unsigned base;
std::string literal;
};
Since the base and numbers can have underscores embedded, I've used range-v3
's
views::filter()
function. Another approach to handle those separated number has
sehe shown at X3 parse rule doesn't compile.
The core idea is to have (I've used Qi's Nabialek trick long time ago) something like
auto const char_set = [](auto&& char_range, char const* name) {
return x3::rule<struct _, std::string>{ name } = x3::as_parser(
x3::raw[ x3::char_(char_range) >> *(-lit("_") >> x3::char_(char_range)) ]);
};
auto const bin_charset = char_set("01", "binary charset");
auto const oct_charset = char_set("0-7", "octal charset");
auto const dec_charset = char_set("0-9", "decimal charset");
auto const hex_charset = char_set("0-9a-fA-F", "hexadecimal charset");
using Value = ast::number;
using It = std::string::const_iterator;
using Rule = x3::any_parser<It, Value>;
x3::symbols<Rule> const based_parser({
{ 2, as<std::string>[ bin_charset ] },
{ 8, as<std::string>[ oct_charset ] },
{ 10, as<std::string>[ dec_charset ] },
{ 16, as<std::string>[ hex_charset ] }
}, "based character set"
);
auto const base = x3::rule<struct _, unsigned>{ "base" } = dec_charset; // simplified
auto const parser = x3::with<Rule>(Rule{}) [
x3::lexeme[ set_lazy<Rule>[based_parser] >> '#' >> do_lazy<Rule> ]
];
auto const grammar = x3::skip[ x3::space ]( parser >> x3:: eoi );
and use them like
for (std::string const input : {
"2#0101",
"8#42",
"10#4711",
"1_6#DEAD_BEEF",
})
{
...
}
Well, it doesn't compile and hence I do not know if it would work this way. I think, it's
a better way than several lines of alternatives (as my old code). Further, if I study newer
standards of the grammar I like to implement, the syntax has been extended with leading
integer (for numeric width) and other base specifier, e.g. 'UB', 'UO' and others. This
would come off-topic: How can I prepare the code for further grammar extensions (using something like eps[get<std_tag>(ctx) == x42]
)?
For convenience, I've put the example at coliru.