1

I am trying to use boost spirit to create a parser for a simple language. The first statement I am trying to parse is a simple numeric addition: "3.14 + 1". It segfaults, and my research indicates it is because of a left-recursive implementation, but I can't wrap my head around the solution for this that isn't left-recursive. Here is my parser implementation:

template<typename Iterator>
struct simple_grammar : qi::grammar<Iterator, AstNodePtr(), ascii::space_type> {

  simple_grammar() : simple_grammar::base_type(start) {
    using qi::double_;
    using qi::_1;
    using qi::_2;
    using qi::_3;

    auto addhelper = '+' >> expression;
    auto add = expression >> addhelper;

    expression = add[qi::_val = make_shared_<AddNode>()(_1, _2)]
               | double_[qi::_val = make_shared_<DoubleNode>()(_1)];

    start = expression;
  }


  qi::rule<Iterator, AstNodePtr(), ascii::space_type> start;
  qi::rule<Iterator, AstNodePtr(), ascii::space_type> expression;
};

AstNodePtr is a typedef for a std::shared_ptr to an AstNodeBase class, from which DoubleNode and AddNode inherit. I didn't include those definitions just to keep the post short, but can if requested. It is pretty self-explanatory I think, the idea is to compose an AST. The make_shared_ implementation comes from here.

Thanks in advance!

Alex Court
  • 148
  • 11

1 Answers1

1

There are a number of things about your example.

First off, you use auto with Spirit Qi expressions. That's not valid, and leads to UB:

Next off, you chose to use polymorphic Ast nodes. That's possible but likely not efficient:

Finally, there's the left recursion, since your expression starts with itself, leading to infinite recursion. The only way to solve it is to split your productions up into "levels" of expressions. This also aids in generating the desired operator precedence:

 expression = term >> char_("-+") >> term;
 term = factor >> char_("*/%") >> factor;
 factor = simple >> char_("^") >> simple;

In your case, I'd suggest:

    simple
        = qi::double_
            [_val = make_shared_<DoubleNode>()(_1)]; 
        ;

    expression 
        = (simple >> '+' >> expression)
            [_val = make_shared_<AddNode>()(_1, _2)]
        | simple
        ;

Of course, you can be simpler and slightly less inefficient here:

    expression 
        = simple [_val = _1] 
        >> *(('+' >> expression)
                [_val = make_shared_<AddNode>()(_val, _0)])
        ;

Full Demo

Live On Coliru

#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/phoenix.hpp>

namespace { // https://stackoverflow.com/a/21565350/85371

    template <typename T>
    struct make_shared_f
    {
        template <typename... A> struct result 
            { typedef std::shared_ptr<T> type; };

        template <typename... A>
        typename result<A...>::type operator()(A&&... a) const {
            return std::make_shared<T>(std::forward<A>(a)...);
        }
    };
}

template <typename T>
using make_shared_ = boost::phoenix::function<make_shared_f<T> >;

struct AstNode {
    virtual ~AstNode() = default;
};
using AstNodePtr = std::shared_ptr<AstNode>;
struct DoubleNode : AstNode {
    DoubleNode(double) {}
};
struct AddNode : AstNode {
    AddNode(AstNodePtr, AstNodePtr) {}
};

#include <iomanip>

namespace qi = boost::spirit::qi;

template<typename Iterator>
struct simple_grammar : qi::grammar<Iterator, AstNodePtr()> {

    simple_grammar() : simple_grammar::base_type(start) {
        using namespace qi::labels;

        simple
            = qi::double_
                [_val = make_shared_<DoubleNode>()(_1)]; 
            ;

        expression 
            = simple [_val = _1] 
            >> *(('+' >> expression)
                    [_val = make_shared_<AddNode>()(_val, _1)])
            ;

        start = qi::skip(qi::space) [ expression ];
        BOOST_SPIRIT_DEBUG_NODES((start)(expression)(simple))
    }
  private:
    qi::rule<Iterator, AstNodePtr()> start;
    qi::rule<Iterator, AstNodePtr(), qi::space_type> expression;
    // implicit lexemes
    qi::rule<Iterator, AstNodePtr()> simple;
};

int main() {
    simple_grammar<std::string::const_iterator> g;

    for (std::string const input : {
            "1 + 2",
            "3.14"
            })
    {
        auto f = begin(input), l = end(input);
        AstNodePtr ast;
        if (qi::parse(f, l, g, ast)) {
            std::cout << "Succeeded: " << boost::core::demangle(typeid(*ast).name()) << "\n";
        } else {
            std::cout << "Failed\n";
        }

        if (f!=l)
            std::cout << "Remaining unparsed " << std::quoted(std::string(f,l)) << "\n";
    }
}

Prints

Succeeded: AddNode
Succeeded: DoubleNode
sehe
  • 374,641
  • 47
  • 450
  • 633
  • Off-topic: For a good case study where complete C++ compatible operator precedence and associativity is reached (without too much ast/rule overhead) see my recent [chat](https://chat.stackoverflow.com/rooms/210289/discussion-between-sehe-and-llm) at this [answer](https://stackoverflow.com/questions/60845800/how-to-combine-skipping-and-non-skipping-lexeme-rules/60846101#60846101). It **does** show how you can have awesome polymorphic ASTs without _dynamic_ polymorphism – sehe Apr 14 '20 at 10:45
  • Extended the example with support for `-+*/%^` with proper precedence (as well as parenthesized subexpressions and unary `-+`). Still using the polymorphic Ast for instructional purposes: http://coliru.stacked-crooked.com/a/7a053d937aef0ba7 – sehe Apr 14 '20 at 12:22
  • Wow that is fantastic, thank you! I am required to use a polymorphic AST as this is for an OO class (the AST is really the purpose, not the parsing mechanism). But, doesn't this solution implicate that only things that can reduce to simple can be on the left side of an operator? Also, I took the bad advice about using auto for the nested rules from here: http://pastexen.com/code.php?file=uYv1RjUJZt.cpp&raw – Alex Court Apr 14 '20 at 15:54
  • Thought of this as a potentially more elegant way to compose the Ast nodes: [using a generator type (`X`)](http://coliru.stacked-crooked.com/a/68c5d6995d8116f5). No more phoenix business with shared-pointers. All overloads in `X` are "normal C++". – sehe Apr 14 '20 at 23:32