4

I have a parser for parsing an Identifier like foo, bar, baz and one for parsing also nested identifiers like foo::bar, foo::bar.baz, foo::bar.baz.baham They both parse into the same ast struct, which looks like this:

struct identifier : x3::position_tagged{
    std::vector <std::string> namespaces;
    std::vector <std::string> classes;
    std::string identifier;

};

The parser for an identifier looks like this:

#define VEC_ATR x3::attr(std::vector<std::string>({})) //ugly hack

auto const identifier_def =
                VEC_ATR
                >> VEC_ATR
                >> id_string;

and for the nested_identifier like this:

auto const nested_identifier_def =
        x3::lexeme[
                (+(id_string >> "::") >> +(id_string >> ".") > id_string)
                | (+(id_string >> "::") >> VEC_ATR > id_string)
                | (VEC_ATR >> +(id_string >> ".") > id_string)
                | identifier

        ];

I know shame on me for the macro. The identifier parser works fine, but the nested_identifier has a strange behaviour if I try to parse something like foo::bar::baz the ast objects which falls out of the parser, has all the namespaces, in this case foo and bar twice in the namespaces vector. I have a small example of this strange behaviour here. Can anybody explain me why this happens, and how I can avoid this?

akim
  • 8,255
  • 3
  • 44
  • 60
Exagon
  • 4,798
  • 6
  • 25
  • 53
  • 1
    what makes it useful to distinguis classes from namespaces? E.g. in C++, classes _are_ namespaces. – sehe Sep 16 '16 at 22:54
  • you are right, I wanted the ast to hold as much information as I could get. Thats why i parse an id followed by a dot into the classes vector. maybe classes isnt the best name, but I couldnt find another quick – Exagon Sep 16 '16 at 23:04

2 Answers2

8

The reason why you get that behaviour is that the alternative parser does not automatically rollback the changes made to the external attribute when one of its branches fails.

In your case this is what happens:

  • Initially the attribute is [{},{},""].
  • The first alternative branch is tried.
  • id_string >> "::" matches twice and adds foo and bar to the first vector ->[{foo,bar},{},""].
  • id_string >> "." fails to match -> the sequence fails -> the alternative branch fails (leaving the attribute unchanged).
  • The second alternative branch is tried.
  • id_string >> "::" matches twice and adds foo and bar to the first vector ->[{foo,bar,foo,bar},{},""].
  • attr(vector<string>({})) succeeds (attr always succeeds) and substitutes the empty second vector with a vector with an empty string -> [{foo,bar,foo,bar},{""},""].
  • id_string matches and baz is added to the attribute ->[{foo,bar,foo,bar},{""},baz].
  • The second alternative branch succeeds.

In Spirit.Qi the solution in this case is quite easy, simply use the hold directive. Unfortunately this directive is not yet implemented in Spirit.X3. A possible alternative could be putting each of the alternative branches in its own x3::rule either explicitly or with as<ast::identifier>(alternative_branch) as used here by sehe. Here is a simplified example that shows the as approach.

Another possibility could be implementing the hold directive, here is my attempt(running on WandBox):

#include <boost/spirit/home/x3/support/context.hpp>
#include <boost/spirit/home/x3/core/skip_over.hpp>
#include <boost/spirit/home/x3/core/parser.hpp>

namespace boost { namespace spirit { namespace x3
{
    template <typename Subject>
    struct hold_directive : unary_parser<Subject, hold_directive<Subject>>
    {
        typedef unary_parser<Subject, hold_directive<Subject> > base_type;
        static bool const is_pass_through_unary = true;
        static bool const handles_container = Subject::handles_container;

        hold_directive(Subject const& subject)
          : base_type(subject) {}

        template <typename Iterator, typename Context
          , typename RContext, typename Attribute>
        bool parse(Iterator& first, Iterator const& last
          , Context const& context, RContext& rcontext, Attribute& attr) const
        {
            Attribute copy(attr);
            if (this->subject.parse(first, last, context, rcontext, copy))
            {
                traits::move_to(copy, attr);
                return true;
            }
            return false;
        }

    };

    struct hold_gen
    {
        template <typename Subject>
        hold_directive<typename extension::as_parser<Subject>::value_type>
        operator[](Subject const& subject) const
        {
            return { as_parser(subject) };
        }
    };

    auto const hold = hold_gen{};
}}}
Community
  • 1
  • 1
llonesmiz
  • 155
  • 2
  • 11
  • 20
  • never used this `as` before. how does it work? thank you a lot for your help, I wasnt expecting this behaviour. – Exagon Sep 16 '16 at 23:08
  • 1
    I think it links to the place where it's explained :) – sehe Sep 16 '16 at 23:09
  • oh sorry I was to fast – Exagon Sep 16 '16 at 23:12
  • 1
    @Exagon [Here](http://coliru.stacked-crooked.com/a/2aa67603f92d5b16) is a simplified example that shows the `as` approach. – llonesmiz Sep 16 '16 at 23:33
  • @sehe and jv_ I would be screwd without you two, thank you for helping me all the time with spirit – Exagon Sep 17 '16 at 09:07
  • @sehe, why have your working as and hold parsers not been added to x3?.... – rmawatson Jul 11 '18 at 17:21
  • @rmawatson I didn't propose them. I reckon if they would, they'd add them in low-level style. The whole point for me is that you don't need special library support for these, as they're pretty easily expressed in c++14 – sehe Jul 11 '18 at 17:28
  • @rmawatson More testament to the versatility of C++14 with X3: [porting a subset of phoenix for semantic actions into X3](https://github.com/sehe/expression-parsers/blob/x3-c%2B%2B17/phoeni_x3.hpp) – sehe Jul 11 '18 at 17:32
  • @sehe be really great to see them merged in. Shame to have such useful bits floating about on SO. – rmawatson Jul 11 '18 at 21:34
  • I'm in favor of libraries with a maintainable documented interface. Picking the right core set to maintain is not an easy task. – sehe Jul 11 '18 at 21:36
2

Please note that as of Boost1.70, the solution proposed by @sehe does not work anymore (see this discussion).

The only workaround now is to refactor the grammar, so that the rollback wouldn't be needed.

Igor R.
  • 14,716
  • 2
  • 49
  • 83