1

This small sample grammar just parse this statements

a        <--special (but ok because rule in grammer)
a()
a.b      <--special
a.b()
a.b().c  <--special
a().b.c()
a().b    <--special

all cases with non () at the end are special and should be separate rules in spirit. Only the rule (special case 1) is correct so far. How to define a rule which capture all other cases without () at the end ?

  lvalue_statement =

    (
      name >> +(
          (lit('(') >> paralistopt  >> lit(')')[_normal_action_call]
        | (lit('.') >> name)                   [_normal_action_dot]
        )
      
      | name                                   [_special_action_]  // special case 1
    )
    

another sample to explain what "special" means, you can see that the ROOT node should have the special AST Node or action

a.b         -> SPECIAL_DOT(a,b)
a.b.c       -> SPECIAL_DOT(a,NORMAL_DOT(b,c))
a(para).b.c -> SEPCIAL_DOT(NORMAL_DOT(CALL(a,para),c)
Markus
  • 373
  • 1
  • 11
  • I see only one rule. What is the special rule? – sehe Jul 03 '20 at 12:53
  • To the edit: is it "special" if it is the "trailing limb" only? Surely you can detect that from the ast during evaluation? – sehe Jul 03 '20 at 15:06
  • yes i can detect this in the AST .. but i would prefer to handle this inside boost and create a different AST in this case with this SPECIAL_DOT node – Markus Jul 03 '20 at 15:09

1 Answers1

1

I'm quite averse of so many semantic actions¹.

I also think that's not your problem.

In language terms, you'd expect a.b to be member dereference, a() to be invocation, and hence a.b() would be invoation of a.b after the member dereference.

In that sense, a.b is the normal case, because it doesn't do invocation. a.b() would be "more special" in the sense that it is the same PLUS invocation.

I'd phrase my expression grammar to reflect this:

lvalue = name >> *(
        '.' >> name
      | '(' >> paralistopt >> ')'
    );

This parses everything. Now you might go with semantic actions or attribute propagation

Semantic Actions #1

auto lvalue = name [ action("normal") ] >> *(
        '.' >> name [ action("member_access") ]
      | ('(' >> paralistopt >> ')') [ action("call") ]
    );

There you go. Let's come up with a generic action that logs stuff:

auto action = [](auto type) {
    return [=](auto& ctx){
        auto& attr = _attr(ctx);
        using A = std::decay_t<decltype(attr)>;

        std::cout << type << ":";
        if constexpr(boost::fusion::traits::is_sequence<A>::value) {
            std::cout << boost::fusion::as_vector(attr);
        } else if constexpr(x3::traits::is_container<A>::value && not std::is_same_v<std::string, A>) {
            std::string_view sep;
            std::cout << "{";
            for (auto& el : attr) { std::cout << sep << el; sep = ", "; }
            std::cout << "}";
        } else {
            std::cout << attr;
        }
        std::cout << "\n";
    };
};

Now we can parse all the samples (plus a few more):

Live On Coliru prints:

 === "a"
normal:a
Ok
 === "a()"
normal:a
call:{}
Ok
 === "a.b"
normal:a
member_access:b
Ok
 === "a.b()"
normal:a
member_access:b
call:{}
Ok
 === "a.b().c"
normal:a
member_access:b
call:{}
member_access:c
Ok
 === "a().b.c()"
normal:a
call:{}
member_access:b
member_access:c
call:{}
Ok
 === "a().b.c()"
normal:a
call:{}
member_access:b
member_access:c
call:{}
Ok
 === "a(q,r,s).b"
normal:a
call:{q, r, s}
member_access:b
Ok

SA #2: Building an AST

Let's model the AST:

namespace Ast {
    using name   = std::string;
    using params = std::vector<name>;

    struct member_access;
    struct call;

    using lvalue = boost::variant<
        name,
        boost::recursive_wrapper<member_access>,
        boost::recursive_wrapper<call>
    >;

    using params = std::vector<name>;
    struct member_access { lvalue obj; name member; } ;
    struct call          { lvalue f; params args;   } ;
}

Now we can replace the actions:

auto lvalue
    = rule<struct lvalue_, Ast::lvalue> {"lvalue"}
    = name [ ([](auto& ctx){ _val(ctx) = _attr(ctx); }) ] >> *(
        '.' >> name [ ([](auto& ctx){ _val(ctx) = Ast::member_access{ _val(ctx), _attr(ctx) }; }) ]
      | ('(' >> paralistopt >> ')') [ ([](auto& ctx){ _val(ctx) = Ast::call{ _val(ctx), _attr(ctx) }; }) ]
    );

That's ugly - I don't recommend writing your code this way, but at least it drives home how few steps are involved.

Also adding some output operators:

namespace Ast { // debug output
    static inline std::ostream& operator<<(std::ostream& os, Ast::member_access const& ma) {
        return os << ma.obj << "." << ma.member;
    }
    static inline std::ostream& operator<<(std::ostream& os, Ast::call const& c) {
        std::string_view sep;
        os << c.f << "(";
        for (auto& arg: c.args) { os << sep << arg; sep = ", "; }
        return os << ")";
    }
}

Now can parse everything with full AST: Live On Coliru, printing:

"a" -> a
"a()" -> a()
"a.b" -> a.b
"a.b()" -> a.b()
"a.b().c" -> a.b().c
"a().b.c()" -> a().b.c()
"a().b" -> a().b
"a(q,r,s).b" -> a(q, r, s).b

Automatic Propagation

Actually I sort of got stranded doing this. It took me too long to get it right and parse the associativity in a useful way, so I stopped trying. Let's instead summarize by cleaning up out second SA take:

Summary

Making the actions more readable:

auto passthrough =
    [](auto& ctx) { _val(ctx) = _attr(ctx); };
template <typename T> auto binary_ =
    [](auto& ctx) { _val(ctx) = T { _val(ctx), _attr(ctx) }; };

auto lvalue
    = rule<struct lvalue_, Ast::lvalue> {"lvalue"}
    = name [ passthrough ] >> *(
        '.' >> name                 [ binary_<Ast::member_access> ]
      | ('(' >> paralistopt >> ')') [ binary_<Ast::call> ]
    );

Now there are a number of issues left:

You might want a more general expression grammar that doesn't just parse lvalue expressions (e.g. f(foo, 42) should probably parse, as should len("foo") + 17?).

To that end, the lvalue/rvalue distinction doesn't belong in the grammar: it's a semantic distinction mostly.

I happen to have created an extended parser that does all that + evaluation against proper LValues (while supporting general values). I'd suggest looking at the [extended chat][3] at this answer and the resulting code on github: https://github.com/sehe/qi-extended-parser-evaluator .

Full Listing

Live On Coliru

#include <boost/spirit/home/x3.hpp>
#include <iostream>
#include <iomanip>
namespace x3 = boost::spirit::x3;

namespace Ast {
    using name   = std::string;
    using params = std::vector<name>;

    struct member_access;
    struct call;

    using lvalue = boost::variant<
        name,
        boost::recursive_wrapper<member_access>,
        boost::recursive_wrapper<call>
    >;

    using params = std::vector<name>;
    struct member_access { lvalue obj; name member; } ;
    struct call          { lvalue f; params args;   } ;
}

namespace Ast { // debug output
    static inline std::ostream& operator<<(std::ostream& os, Ast::member_access const& ma) {
        return os << ma.obj << "." << ma.member;
    }
    static inline std::ostream& operator<<(std::ostream& os, Ast::call const& c) {
        std::string_view sep;
        os << c.f << "(";
        for (auto& arg: c.args) { os << sep << arg; sep = ", "; }
        return os << ")";
    }
}

namespace Parser {
    using namespace x3;

    auto name
        = rule<struct string_, Ast::name> {"name"}
        = lexeme[alpha >> *(alnum|char_("_"))];

    auto paralistopt
        = rule<struct params_, Ast::params> {"params"}
        = -(name % ',');

    auto passthrough =
        [](auto& ctx) { _val(ctx) = _attr(ctx); };
    template <typename T> auto binary_ =
        [](auto& ctx) { _val(ctx) = T { _val(ctx), _attr(ctx) }; };

    auto lvalue
        = rule<struct lvalue_, Ast::lvalue> {"lvalue"}
        = name [ passthrough ] >> *(
            '.' >> name                 [ binary_<Ast::member_access> ]
          | ('(' >> paralistopt >> ')') [ binary_<Ast::call> ]
        );

    auto start = skip(space) [ lvalue ];
}

int main() {
    for (std::string const input: {
            "a",       // special (but ok because rule in grammer)
            "a()",
            "a.b",     // special
            "a.b()",
            "a.b().c", // special
            "a().b.c()",
            "a().b",   // special
             "a(q,r,s).b",
        })
    {
        std::cout << std::quoted(input) << " -> ";

        auto f = begin(input), l = end(input);
        Ast::lvalue parsed;
        if (parse(f, l, Parser::start, parsed)) {
            std::cout << parsed << "\n";;
        } else {
            std::cout << "Failed\n";
        }
        if (f!=l) {
            std::cout << " -- Remainig: " << std::quoted(std::string(f,l)) << "\n";
        }
    }
}

Prints

"a" -> a
"a()" -> a()
"a.b" -> a.b
"a.b()" -> a.b()
"a.b().c" -> a.b().c
"a().b.c()" -> a().b.c()
"a().b" -> a().b
"a(q,r,s).b" -> a(q, r, s).b

¹ (they lead to a mess in the presence of backtracking, see Boost Spirit: "Semantic actions are evil"?)

sehe
  • 374,641
  • 47
  • 450
  • 633
  • In the off-chance that you were actually using Qi (you didn't say): **[Qi version](http://coliru.stacked-crooked.com/a/ddb5a2f5123d89d0)** or **[Alternative take](https://wandbox.org/permlink/3O8M1qSXJhjcyvAi)** – sehe Jul 03 '20 at 14:59
  • thank you for the good sample .. i add more details in the original post. Maybe i was not clear enough about the topic here. The Goal is to detect and produce a special AST-Node when the input ends with just a ".name". (named in the additional sample as SPECIAL_DOT) – Markus Jul 03 '20 at 15:07
  • Doing the replies here https://chat.stackoverflow.com/rooms/217164/parsers – sehe Jul 03 '20 at 15:22