0

I am trying to use an inherited instance of the generated BaseVisitor class to construct an AST from the parse tree of the grammar I am using for a simple compiler.

Consider a subset of my grammar where Stat is a statement of a lightweight language:

...
program: stat;

stat: SkipStat                         # Skip
    | type Ident AssignEq assignRHS    # Declare
    | assignLHS AssignEq assignRHS     # Assign

...

My understanding (as per this post) is to have the visitor call visit(ctx->stat()) where ctx has type ProgramContext*. The derived visitor then correctly makes calls to the corresponding overridden visitSkip(..), visitDeclare(..), etc.

I have simple node classes for my AST, again a subset looks as follows:

struct BaseNode {};

struct Program : BaseNode {
    Program(std::shared_ptr<Stat> body) : body(std::move(body)) {}
    std::shared_ptr<Stat> body;
};

struct Stat : BaseNode {};

struct Assign : Stat {
    Assign(std::shared_ptr<AssignLHS> lhs, std::shared_ptr<AssignRHS> rhs) :
        lhs(std::move(lhs)),
        rhs(std::move(rhs)) {}

    std::shared_ptr<AssignLHS> lhs;
    std::shared_ptr<AssignRHS> rhs;
};

struct Declare : Stat {
    Declare(std::shared_ptr<Type> type, std::string name, std::shared_ptr<AssignRHS> rhs) :
        type(std::move(type)),
        name(std::move(name)),
        rhs(std::move(rhs)) {}

    std::shared_ptr<Type> type;
    std::string name;
    std::shared_ptr<AssignRHS> rhs;
};

struct Skip : Stat {};

Tying the two points together, I am trying to have the mentioned visitSkip(..), visitDeclare(..), etc. (which are all of type std::any) to return std::shared_ptr<Skip>, std::shared_ptr<Declare>, etc. such that visitProgram(..) can receive them from a call to visit in the form

std::shared_ptr<Stat> stat = std::any_cast<std::shared_ptr<Stat>>visit(ctx->stat());

However (!), std::any only allows casts with the exact known class and not any derived class, so this approach does not work. Instead, I have started to create my own visitor entirely separate from (i.e. not a child of) the generated visitor.

I assume there is a better solution using the generated classes that I am missing.

Having found this answer to a not dissimilar post, is it even worth constructing an AST? If my idea of how to use antlr4 is inaccurate, please let me know and point me towards a good source I can start from. Thanks.

Edit: As per chapter 7 of the definitive antlr 4 reference, I believe I could achieve what I desire through the use of a stack holding BaseNode* and casting the popped nodes appropriately. This does not seem like the best solution. Ideally, I would like to achieve something similar to the java implementation method wherein we pass an expected return type into the visitor classes.

Edit 2: I have now implemented such a solution (making use of a listener instead of visitors), example below from the exitAssign(..) function:

void Listener::exitAssign(Parser::AssignContext* ctx) {
    const auto rhs = std::static_pointer_cast<AssignRHS>(m_stack.top());
    m_stack.pop();

    const auto lhs = std::static_pointer_cast<AssignLHS>(m_stack.top());
    m_stack.pop();

    m_stack.push(std::make_shared<Assign>(lhs, rhs));
}

I still feel this solution is not the best - it feels very hacky as order of arguments must be popped in reverse, and it is easy to forget to push onto the stack after creating the AST node.

I will use this implementation for now, but again, if a better method is preferred by people who use antlr 4 in c++, please do let me know.

1 Answers1

0

To parse an arithmetic expression, I preferred to use a listener with an overload of "exit" methods:

class MyListener final : public FormulaBaseListener {
public:
   void exitUnaryOp(FormulaParser::UnaryOpContext *ctx) override;
   void exitLiteral(FormulaParser::LiteralContext *ctx) override;
   void exitCell(FormulaParser::CellContext *ctx) override;
   void exitBinaryOp(FormulaParser::BinaryOpContext *ctx) override;

   std::vector<astToken> getResult();
};

int main(){
     antlr4::ANTLRInputStream input(".........");
     std::unique_ptr<FormulaLexer> up_fl;
     up_fl = std::make_unique<FormulaLexer>(&input);
     FormulaLexer& fl = *up_fl;
     BailErrorListener error_listener; //custom : public antlr4::BaseErrorListener with syntaxError override
     fl.removeErrorListeners();
     fl.addErrorListener(&error_listener);
     antlr4::CommonTokenStream tokens(&fl);
     std::unique_ptr<FormulaParser> up_parser;
     up_parser = std::make_unique<FormulaParser>(&tokens);
     FormulaParser& parser = *up_parser;
     auto error_handler = std::make_shared<antlr4::BailErrorStrategy>();
     parser.setErrorHandler(error_handler);
     parser.removeErrorListeners();
     FormulaParser::MainContext* tree;
     tree = parser.main();
     MyListener listener; //custom final : public FormulaBaseListener with void exit_*_(FormulaParser::_*_Context *ctx) override;
     antlr4::tree::ParseTreeWalker::DEFAULT.walk(&listener, tree);
     asttree_ = listener.getResult(); //get what you want

}
Kaktuts
  • 86
  • 3
  • Apologies for not making it clear, the stack approach in edit 2 was done using a listener similar to yours. The problem I was having (which I think is also present in your implementation) was that of getting more meaningful structure returned. A vector of AST nodes/tokens could certainly be useful, but it isn't a tree. – josh-stackoverflow Apr 06 '23 at 15:16
  • Yeah, this is pretty the obvious way. Hard to say what is better without knowing your grammar and the final purpose. Play with the enter and exit methods, look at the logic of the work, create your structure suitable for further use. In my case, for an arithmetic expression, I was satisfied with the vector of tokens for further calculations. – Kaktuts Apr 06 '23 at 15:50
  • btw have you already tried grun for tree visualization? – Kaktuts Apr 06 '23 at 15:58
  • Yes I have. Without unnecessarily dumping the entire contents of my grammar, I am trying to create a small compiler for a custom language and custom target. I wanted to create a formal AST (i.e. not the parse tree created by antlr) so I can do proper traversals of it (semantic checking, code generation passes, etc.) and was wondering the correct approach. – josh-stackoverflow Apr 06 '23 at 16:40