3

I've been trying to implement the Visitor Pattern to parse some specific SQL Statements into an internal object structure consisting of TableDefinition and ColumnDefinition objects.

This is a small (stripped down) portion from the grammar:

column_definition
 : column_name datatype? column_constraint*
 ;
 
column_constraint
 : ( K_CONSTRAINT name )?
   ( K_PRIMARY K_KEY ( K_CLUSTERED | K_NONCLUSTERED )? ( K_ASC | K_DESC )? K_AUTOINCREMENT?
   | K_AUTOINCREMENT
   | K_NOT? K_NULL
   )
 ;

datatype
 : K_CHAR ( '(' unsigned_integer ')' )?                                     #char
 | K_DATE                                                                   #date
 ;

And here is one of the derived BaseVisitors which is meant to return ColumnDefinitions:

namespace SqlParser.Visitor
{
    public class DataTypeVisitor: SqlAnywhereParserBaseVisitor<ColumnDefinition>
    {
        public override ColumnDefinition VisitColumn_definition([NotNull] SqlAnywhereParser.Column_definitionContext context)
        {
            var res = VisitChildren(context);
            var constraint = (SqlAnywhereParser.Column_constraintContext[])context.column_constraint();

            if (res != null) // Add NULL attributes
            {
                if (constraint.Any(c => c.K_NULL() != null && c.K_NOT() == null))
                    res.IsNullable = true;

                if (constraint.Any(c => c.K_NULL() != null && c.K_NOT() != null))
                    res.IsNullable = false;
            }

            return res;
        }

        public override ColumnDefinition VisitChar([NotNull] SqlAnywhereParser.CharContext context)
        {
            return new ColumnDefinition()
            {
                DataType = DbType.StringFixedLength,
                Length = int.Parse(context.unsigned_integer()?.GetText() ?? "1") 
            };
        }
   }
}

When I debug the process, I can observe how the call to VisitChildren goes into VisitChar which returns a ColumnDefinition object. When VisitChar completes and the cursor jumps back to continue in VisitColumn_definition the variable res is null.

Did I miss something crucial or did I misunderstand the visitor pattern? Before I tried VisitChildren I used to use the base.VisitColumn_definition(context) call, which basically only calls VisitChildren.

Does anyone have a hint, which mistakes I made? Why doesn't my ColumnDefinition result created at the VisitChar leaf bubble up?

Below is my testinput:

CREATE TABLE "DBA"."pbcattbl" (
    "pbt_tnam"                       char(129) NOT NULL
   ,"pbt_tid"                        char(5) NULL
);
Adrian
  • 91
  • 5
  • Are you asking why isn’t the VisitChar returning an object instead of null? Or why is the res var null, but not the VisitChar return value? – aleksander_si Jul 06 '20 at 16:50
  • You get no result because if you debug the runtime, VisitChildren() calls `result = AggregateResult(result, childResult);` for each child, and AggregateResult() just returns childResult. So, you get the result of the last child for a parse tree node unless you override the Visit() method for that node. Call the Visit....() methods of the children instead of VisitChildren(context), and compute the value returned. Also, I always build a local copy of the runtime with Debug so I do not need to rely of reading the documentation that who knows is correct. Never trust documentation. – kaby76 Jul 06 '20 at 19:26
  • to aleksander_si: I'm asking why res = null after descending down to visitChar. || to Bart Kiers: I already thought about that, but assumed the default implementation just bubbles up results from the bottom. || @kaby76: Thank you for hint hint. I read about AggregateResult and assumed, that will just give me a collection of child results, instead of the last one. I will give it a try to build a local copy to see what exactly happens. – Adrian Jul 06 '20 at 19:57
  • @Adrian added an example – Bart Kiers Jul 07 '20 at 06:53

2 Answers2

3

I found a solution:

protected override List<ColumnDefinition> AggregateResult(List<ColumnDefinition> aggregate, List<ColumnDefinition> nextResult)
{
    if (aggregate != null && nextResult != null) aggregate.AddRange(nextResult);
    return aggregate ?? nextResult;
}

I converted the result to List<ColumnDefinition> and added an appropriate override to AggregateResult.

Thank you @kaby76 for pointing me into the right direction with your comment. Also thanks to all others for the feedback and quick responses!

Adrian
  • 91
  • 5
1

You'll have to override all Visit...(... context) calls (at least all the ones that your parse tree has in it). Let's say you have this grammar:

grammar T;

parse
 : expr EOF
 ;

expr
 : expr ( '*' | '/' ) expr #multExpr
 | expr ( '+' | '-' ) expr #addExpr
 | NUMBER                  #numberExpr
 ;
 
NUMBER
 : [0-9]+ ( '.' [0-9]+ )?
 ;

SPACES
 : [ \t\r\n]+ -> skip
 ;

And you're parsing the expression "42". Then it's not sufficient to override just the method VisitNumberExpr(TParser.NumberExprContext context):

using System;
using Antlr4.Runtime;

namespace AntlrTest
{
    class Program
    {
        static void Main(string[] args)
        {
            var lexer = new TLexer(CharStreams.fromstring("42"));
            var parser = new TParser(new CommonTokenStream(lexer));
            var root = parser.parse();
            var evaluated = new CustomVisitor().Visit(root);
            
            Console.WriteLine($"evaluated: {evaluated}");
        }
    }

    class CustomVisitor : TBaseVisitor<decimal>
    {
        public override decimal VisitNumberExpr(TParser.NumberExprContext context)
        {
            return decimal.Parse(context.GetText());
        }
    }
}

It will return the default 0. In this case, you should also override VisitParse(TParser.ParseContext context):

class CustomVisitor : TBaseVisitor<decimal>
{
    public override decimal VisitParse(TParser.ParseContext context)
    {
        return Visit(context.expr());
    }
    
    public override decimal VisitNumberExpr(TParser.NumberExprContext context)
    {
        return decimal.Parse(context.GetText());
    }
}

which now returns 42.

If you don't want to override/implement too many rules, you could use a listener instead:

using System;
using Antlr4.Runtime;
using Antlr4.Runtime.Tree;

namespace AntlrTest
{
    class Program
    {
        static void Main(string[] args)
        {
            var lexer = new TLexer(CharStreams.fromstring("42"));
            var parser = new TParser(new CommonTokenStream(lexer));
            var root = parser.parse();
            var listener = new CustomListener();
            
            ParseTreeWalker.Default.Walk(listener, root);
            
            Console.WriteLine($"Result: {listener.Result}");
        }
    }

    class CustomListener : TBaseListener
    {
        public decimal Result { get; private set; }

        public override void EnterNumberExpr(TParser.NumberExprContext context)
        {
            Result = decimal.Parse(context.GetText());
        }
    }
}
Bart Kiers
  • 166,582
  • 36
  • 299
  • 288
  • ty for the example. I understood the concept but that makes the use of the "BaseClass" very limited, when I have to override every node. In that case my code bloats up pretty hard, since the parser has several abstractions above the create_table rule. Also the other rules need other return types which then leads me to make more than one visitor, with all those overrides. All in all it seems simpler to just collect the results in public class variables, but in that case I lose context when multiple rules add information to one column_definition, in this case the NullableOption. – Adrian Jul 07 '20 at 07:12
  • Then use a listener instead. That can easily be used to "listen" for a single rule. – Bart Kiers Jul 07 '20 at 07:17
  • @Adrian I also included an example of a listener. – Bart Kiers Jul 07 '20 at 07:33
  • In case of a listener I still have the missing context problem: create_table (could be more than one) -> contains column_definitions (definately more than one) + column constraints -> column_definitions create instance and add base information -> one level above should enrich column_definition with constraint information. – Adrian Jul 07 '20 at 07:52
  • An aditional question: when I take the route to override all rules, from my understanding the datatype rule should never be visited, because all alternatives are labeled, is that correct? – Adrian Jul 07 '20 at 07:55
  • "the datatype rule should never be visited" I don't know: I'd have to see the entire grammar to be sure. – Bart Kiers Jul 07 '20 at 08:22
  • "In case of a listener I still have the missing context problem" you could combine listeners and visitors: in a listener you could instantiate a visitor that traverses only part of a sub-tree. Anyway, my answer still stands: use a visitor for a complete walk, use a listener for a partial walk. – Bart Kiers Jul 07 '20 at 08:25
  • The concrete question was why "res" is null. I've tracked down that all overrides were present (with your help that assured me I dont need to override datatype). The missing link was to use a List as result and override the AggregateResult function. I've added an answer with my findings. Thank you for your support and clearing up some facts! – Adrian Jul 07 '20 at 09:06