1

The language I'm trying to write a parser for has a statement, which in essence sets properties for the following text. These properties include

  • case sensitivity
  • format (including different comment style)

I can only imagine to implement this by switching to a different parser. I think this would require to terminate the current parser as successful and return via its attribute what to do with the rest of the unmatched input. How could one accomplish this?

sehe
  • 374,641
  • 47
  • 450
  • 633
Frank Puck
  • 467
  • 1
  • 11

2 Answers2

1

Use semantic actions and the qi::lazy directive inside the statement parser to invoke the appropriate parsers based on the specified properties

0

Switching to a different parser is one way.

The most notable pattern relating to this is the Nabialek Trick. This builds on the qi::lazy directive.

However, since you already mention multiple flags, that might not scale as it might lead to unnecessary duplication and/or combinatorial explosion.

I'd recommend using some parser state. You can do that using some semantic actions that hold your logic, but it would imply mutable state inside your parser which may hurt re-entrancy, thread-safety and re-usability. Those are pretty general drawbacks of semantic actions.

Instead, Qi offers local attributes, which sit inside the runtime parser context.

As an example, let's switch case-sensitivity:

// sample coming up, making dinner as well

Post-Dinner Update

As always time is a good teacher. I've tried my hand at actually making locals/inherited attributes work for re-entrancy, and it didn't work the way I remembered it.

So, instead let's embrace mutable state and put the option state right in the grammar instance. That way things stay at feasible level of complexity, though you cannot always share parser instances.

Live On Coliru

// #define BOOST_SPIRIT_DEBUG
#include <boost/phoenix.hpp>
#include <boost/spirit/include/qi.hpp>
#include <iomanip>
namespace qi = boost::spirit::qi;
namespace px = boost::phoenix;
using namespace std::string_literals;

template <typename It> struct DemoParser : qi::grammar<It> {
    DemoParser() : DemoParser::base_type(start) {
        using namespace qi::labels;

        // shorthand mnemonics for accessing option state
        auto _case_option   = px::ref(case_opt);
        auto _strict_option = px::ref(strict_opt);
        qi::_r1_type kw_text; // another mnemonic, for the inherited attribute

        // handy phoenix actor (custom "directives")
        auto const _cs = qi::eps(_case_option == Sensitive);
        auto const _ci = qi::eps(_case_option == Insensitive);
     // auto const _sm = qi::eps(_strict_option == StrictOn);

        start = qi::skip(qi::space)[demo];

        demo = qi::eps[_case_option = Case::Sensitive]    // initialize
                      [_strict_option = Strict::StrictOn] // defaults?
            >> -(option | hello) % ';'                    //
            >> qi::eoi;

        option = kw("Option"s) >> (switch_case | switch_strict);
        hello                             //
            = _cs >> "Hello"              //
            | _ci >> qi::no_case["hello"] //
            ;

        _case_sym.add("sensitive", Case::Sensitive)("insensitive", Case::Insensitive);
        _strict_sym.add("on", Strict::StrictOn)("off", Strict::StrictOff);

        _case         = _cs >> _case_sym | _ci >> qi::no_case[_case_sym];
        _strict       = _cs >> _strict_sym | _ci >> qi::no_case[_strict_sym];
        switch_case   = kw("case"s) >> _case[_case_option = _1];
        switch_strict = kw("strict"s) >> _strict[_strict_option = _1];

        px::function c_str = [](std::string const& s) { return s.c_str(); };

        kw = (_cs >> qi::lit(c_str(kw_text))                 // case sensitive
              | _ci >> qi::no_case[qi::lit(c_str(kw_text))]) // case insensitive
            >> !qi::char_("a-zA-Z0-9._"); // lookahead assertion to avoid parsing partial identifiers

        BOOST_SPIRIT_DEBUG_NODES((start)(demo)(option)(hello)(switch_case)(switch_strict)(_case)(_strict)(kw))
    }

  private:
    qi::rule<It> start;

    enum Case { Sensitive, Insensitive } case_opt = Sensitive;
    enum Strict { StrictOff, StrictOn } strict_opt        = StrictOn;
    qi::symbols<char, Case>   _case_sym;
    qi::symbols<char, Strict> _strict_sym;

    using Skipper = qi::space_type;
    qi::rule<It, Skipper> demo, hello, option, switch_case, switch_strict;

    // lexeme
    qi::rule<It, Case()> _case;
    qi::rule<It, Strict()> _strict;
    qi::rule<It, std::string(std::string kw_text)> kw; // using inherited attribute
};

int main() {
    for (std::string_view input :
         {
             "",
             "bogus;", // FAIL
             "Hello;",
             "hello;",
             "Option case insensitive; heLlO;",
             "Option strict off;",
             "Option STRICT off;",
             "Option case insensitive; Option STRICT off;",
             "Option case insensitive; oPTION STRICT off;",
             "Option case insensitive; oPTION STRICT ON;",
             "Option case insensitive; HeLlO; OPTION CASE SENSitive ; HelLO;", // FAIL
             "Option case insensitive; HeLlO; OPTION CASE SENSitive ; Hello;",
         }) //
    {
        DemoParser<std::string_view::const_iterator> p; // mutable instance now
                                                        //
        bool ok = parse(begin(input), end(input), p);
        std::cout << quoted(input) << " -> " << (ok ? "PASS" : "FAIL") << std::endl;
    }
}

Printing the expected output for the test cases:

"" -> PASS
"bogus;" -> FAIL
"Hello;" -> PASS
"hello;" -> FAIL
"Option case insensitive; heLlO;" -> PASS
"Option strict off;" -> PASS
"Option STRICT off;" -> FAIL
"Option case insensitive; Option STRICT off;" -> PASS
"Option case insensitive; oPTION STRICT off;" -> PASS
"Option case insensitive; oPTION STRICT ON;" -> PASS
"Option case insensitive; HeLlO; OPTION CASE SENSitive ; HelLO;" -> FAIL
"Option case insensitive; HeLlO; OPTION CASE SENSitive ; Hello;" -> PASS

Improving Compiletimes: X3

I honestly think that for dynamically parameterizing/composing rules X3 is a bit more convenient. It also compiles a lot faster and is easier to add some debug side-effecting if desired:

Live On Coliru

// #define BOOST_SPIRIT_X3_DEBUG
#include <boost/spirit/home/x3.hpp>
#include <iomanip>
#include <iostream>
namespace x3 = boost::spirit::x3;
using namespace std::string_literals;

namespace DemoParser {
    enum Case { Insensitive, Sensitive };
    enum Strict { StrictOff, StrictOn };
    struct Options {
        enum Case   case_opt   = Sensitive;
        enum Strict strict_opt = StrictOn;
    };

    // custom "directives"
    auto const _cs = x3::eps[([](auto& ctx) { _pass(ctx) = get<Options>(ctx).case_opt == Sensitive; })];
    auto const _ci = x3::eps[([](auto& ctx) { _pass(ctx) = get<Options>(ctx).case_opt == Insensitive; })];
 // auto const _sm = x3::eps[([](auto& ctx) { _pass(ctx) = get<Options>(ctx).strict_opt == StrictOn; })];

    auto set_opt = [](auto member) {
        return [member](auto& ctx) {
            auto& opt = get<Options>(ctx).*member;
            x3::traits::move_to(_attr(ctx), opt); 
        };
    };

    static inline auto variable_case(auto p, char const* name = "variable_case") {
        using Attr = x3::traits::attribute_of<decltype(p), x3::unused_type, void>::type;
        return x3::rule<struct _, Attr, true>{name} = //
            (_cs >> x3::as_parser(p) |                //
             _ci >> x3::no_case[x3::as_parser(p)]);
    }

    static inline auto kw(char const* kw_text) {
        // using lookahead assertion to avoid parsing partial identifiers
        return x3::rule<struct kw, std::string>{kw_text} = x3::lexeme[ //
                   variable_case(x3::lit(kw_text), kw_text)            //
                   >> !x3::char_("a-zA-Z0-9._")                        //
        ];
    }

    auto _case_sym = x3::symbols<Case>{}.add("sensitive", Case::Sensitive)("insensitive", Case::Insensitive).sym;
    auto _strict_sym = x3::symbols<Strict>{}.add("on", Strict::StrictOn)("off", Strict::StrictOff).sym;

    auto switch_case   = kw("case") >> variable_case(_case_sym)[set_opt(&Options::case_opt)];
    auto switch_strict = kw("strict") >> variable_case(_strict_sym)[set_opt(&Options::strict_opt)];

    auto option = kw("Option") >> (switch_case | switch_strict);
    auto hello  = _cs >> "Hello"      //
        | _ci >> x3::no_case["hello"] //
        ;

    auto demo  = -(option | hello) % ';' >> x3::eoi;
    auto start = x3::skip(x3::space)[demo];
}

int main() {
    auto const p = DemoParser::start; // stateless parser
    using DemoParser::Options;

    for (std::string_view input :
         {
             "",
             "bogus;", // FAIL
             "Hello;",
             "hello;",
             "Option case insensitive; heLlO;",
             "Option strict off;",
             "Option STRICT off;",
             "Option case insensitive; Option STRICT off;",
             "Option case insensitive; oPTION STRICT off;",
             "Option case insensitive; oPTION STRICT ON;",
             "Option case insensitive; HeLlO; OPTION CASE SENSitive ; HelLO;", // FAIL
             "Option case insensitive; HeLlO; OPTION CASE SENSitive ; Hello;",
         }) //
    {
        Options opts;

        bool ok = parse(begin(input), end(input), x3::with<Options>(opts)[p]);
        std::cout << quoted(input) << " -> " << (ok ? "PASS" : "FAIL") << std::endl;
    }
}

Still printing the same test output:

"" -> PASS
"bogus;" -> FAIL
"Hello;" -> PASS
"hello;" -> FAIL
"Option case insensitive; heLlO;" -> PASS
"Option strict off;" -> PASS
"Option STRICT off;" -> FAIL
"Option case insensitive; Option STRICT off;" -> PASS
"Option case insensitive; oPTION STRICT off;" -> PASS
"Option case insensitive; oPTION STRICT ON;" -> PASS
"Option case insensitive; HeLlO; OPTION CASE SENSitive ; HelLO;" -> FAIL
"Option case insensitive; HeLlO; OPTION CASE SENSitive ; Hello;" -> PASS

For the qi::lazy approach I'm a bit out of time, I think I'll refer to my existing examples on this site.

sehe
  • 374,641
  • 47
  • 450
  • 633
  • In essence this are only 3 formats: format_1&&case_sensitive, format_1&&!case_sensitive, format_2&&!case_sensitive. So I would appreciate if you could show the usage of lazy for doing this. – Frank Puck Jun 26 '23 at 15:34
  • Yeah. I'm finding that doing the whole locals/inherited attribute gets unwieldy. The way I remembered it the locals would "automatically" be passed from parent to child rule. Since that's not apparently happening (?! http://coliru.stacked-crooked.com/a/99bbba6d8cfb3a0b) I do suggest lazy rules. I'll have to come back to this in a few hours because of other work. Meanwhile you can search my existing answers for examples, of course – sehe Jun 26 '23 at 16:06
  • I updated my answer with the work I did since. Hope it helps – sehe Jun 27 '23 at 14:15