1

I am new to ANTLR, I have a list of functions which are mostly of nested types.

Below are the examples for functions:

1. Function.add(Integer a,Integer b)
2. Function.concat(String a,String b)
3. Function.mul(Integer a,Integer b)

If the input is having:

Function.concat(Function.substring(String,Integer,Integer),String)

So by using ANTLR with Java program, how to define and validate whether the function names are correct and parameter count and datatypes are correct, which has to be recursive as the Function will be in deeply nested format?

validate test class:

public class FunctionValidate {

public static void main(String[] args) {

    FunctionValidate fun = new FunctionValidate();
    fun.test("FUNCTION.concat(1,2)");  

}

private String test(String source) {
    CodePointCharStream input = CharStreams.fromString(source);
    return compile(input);
}

private String compile(CharStream source) {
    MyFunctionsLexer lexer = new MyFunctionsLexer(source);
    CommonTokenStream tokenStream = new CommonTokenStream(lexer);
    MyFunctionsParser parser = new MyFunctionsParser(tokenStream);
    FunctionContext tree = parser.function();
    ArgumentContext tree1= parser.argument();
    FunctionValidateVisitorImpl visitor = new FunctionValidateVisitorImpl();
    visitor.visitFunction(tree);
    visitor.visitArgument(tree1);
    return null;
}

}

Visitor impl:

    public class FunctionValidateVisitorImpl extends MyFunctionsParserBaseVisitor<String> {

    @Override
    public String visitFunction(MyFunctionsParser.FunctionContext ctx) {
        String function = ctx.getText();
        System.out.println("------>"+function);
        return null;
    }


    @Override
    public String visitArgument(MyFunctionsParser.ArgumentContext ctx){
        String param = ctx.getText();
        System.out.println("------>"+param);
        return null;
    }


}

System.out.println("------>"+param); this statement is not printing argument it is only printing ------>.

ashok
  • 1,078
  • 3
  • 20
  • 63
  • 1
    You could look at any grammar in the [grammar repository at Github](https://github.com/antlr/grammars-v4), which supports common expression syntax with functions (e.g. JS, C++, Java and many more). – Mike Lischke Oct 12 '19 at 08:20
  • Can you suggest how to write grammer for checking the nested function. – ashok Oct 13 '19 at 18:52
  • If no operation allows mixing different data types, then it is enough to define them separately in the grammar. For parameter count, just hardcode the number of arguments in the grammar. However, even if it may be possible to check such things syntactically, it is poor practice and it doesn't work with more realistic languages; these should be static checks. – effeffe Oct 15 '19 at 13:45

1 Answers1

2

This task can be accomplished by implementing two main steps:

1) Parse given input and build an Abstract Syntax Tree (AST).

2) Traverse the tree and validate each function, each argument, one after another, using a Listener or a Visitor patterns.

Fortunately, ANTLR provides tools for implementing both steps.
Here's a simple grammar I wrote based on your example. It does recursive parsing and builds the AST. You may want to extend its functionality to meet your needs.

Lexer:

lexer grammar MyFunctionsLexer;

FUNCTION: 'FUNCTION';

NAME: [A-Z]+;

DOT: '.';

COMMA: ',';

L_BRACKET: '(';

R_BRACKET: ')';

WS : [ \t\r\n]+ -> skip;

Parser:

parser grammar MyFunctionsParser;

options {
    tokenVocab=MyFunctionsLexer;
}

function : FUNCTION '.' NAME '('(argument (',' argument)*)')';

argument: (NAME | function);

Important thing to notice here: the parser does not make distinction between a valid (from your point of view) and invalid functions, arguments, number of arguments, etc. So the function like Function.whatever(InvalidArg) is also a valid construction from parser's point of view. To further validate the input and test whether it meets your requirements (which is a predefined list of functions and their arguments), you have to traverse the tree using a Listener or a Visitor (I think Visitor fits here perfectly).

To get a better understanding of what it is I'd recommend reading this and this. But if you want to get deeper into the subject, you should definitely look at "The Dragons Book", which covers the topic exhaustively.

Pavel Smirnov
  • 4,611
  • 3
  • 18
  • 28
  • The tree produced by ANTLR is a *parse tree*, not an *AST*. – effeffe Oct 15 '19 at 13:36
  • @effeffe, I know. Anyway, for this kind of task there's no big difference which tree to use. So in order to keep the answer as simple as possible and do not bother with "parse tree to AST conversion", I prefer not to make distinctions and leave it as it, since there're much more material about an AST on the Internet rather than a parse tree. And for the OP it'll be easier to get the basic idea of what traversing a tree is and how to use it to collect required information. – Pavel Smirnov Oct 15 '19 at 15:10
  • 1
    I understand there's no big difference here, I just don't see the value of using the wrong terminology. Anyway, in the parser grammar I think you forgot to use some token names you defined in the lexer one. – effeffe Oct 15 '19 at 16:10
  • @effeffe, those tokens are: `. , ( )` and they're used in the parser, just in plain text. ANTLR allows this substitution and replaces them with the right token from the lexer. – Pavel Smirnov Oct 15 '19 at 16:16
  • Here is another question that I have posted related to this, can you please have a look if ANTLR nested function check paramter datatype https://stackoverflow.com/q/58368354/9814870 – ashok Oct 15 '19 at 19:06
  • IMHO that's asking for problems. The nice thing about explicit lexical rules is you're protected from typos ('functon' vs 'function') and get reusability (if you use the same token in different places you only have one definition), but you lose both with implicit tokens. Plus I'm not sure why you sometimes use explicit tokens (`FUNCTION`) and some times not (`'.'`). This looks like bad practice to me: if you want to keep it simple, then don't define tokens at all. My $0.02. – effeffe Oct 16 '19 at 10:54
  • @effeffe, it's a common practice to use single (or double) char tokens in plain text to enhance code readability (`'(' expression ')'` is much better than `L_BRACKET expression R_BRACKET`). The chance of a typo in this case is minimal. The logic is simple: `'.'` is shorter than `DOT`, so I'll use it, but `'FUNCTION'` is longer than just `FUNCTION`, so I prefer the latter one. – Pavel Smirnov Oct 16 '19 at 12:13
  • I have updated the code using the lexer and parser you have provided. I have not implemented visitor but for any function as input it is giving error. Please can you provide with a simple example of visitor for this. – ashok Oct 17 '19 at 14:14
  • ANTLR Tool version 4.4 used for code generation does not match the current runtime version 4.7.2ANTLR Tool version 4.4 used for code generation does not match the current runtime version 4.7.2line 1:0 token recognition error at: 'a' line 1:1 token recognition error at: 'd' line 1:2 token recognition error at: 'd' line 1:4 token recognition error at: '1' line 1:3 mismatched input '(' expecting 'FUNCTION' line 1:6 token recognition error at: '2' null – ashok Oct 17 '19 at 14:20
  • @vikram, the first error says that the parser has been compiled using ANTLR 4.4 (older), but the runtime version is 4.7.2 (newer). You should either compile the parser in ANTLR 4.7.2 as well, or use ANTLR 4.4 jar-file at the runtime. The second error says `line 1:1 token recognition error at: 'd'`. What input do you provide? It expects `Function...`, but there's something starting with `d`. – Pavel Smirnov Oct 17 '19 at 14:23
  • @vikram, also note, that the parser is case-sensitive. So it expects `FUNCTION`, not `Function`. You can either change this behavior by modifying `FUNCTION` token or use a custom char stream. You can find examples of both methods [here](https://github.com/antlr/antlr4/blob/master/doc/case-insensitive-lexing.md). – Pavel Smirnov Oct 17 '19 at 14:28
  • sry I didn't checked you mentioned only lowercase after dot. Its working. – ashok Oct 17 '19 at 14:28
  • Thnk you!! System.out.println("------>"+function); statement is not printing – ashok Oct 17 '19 at 14:32
  • Change `ParseTree tree = parser.function();` to `FunctionContext tree = parser.function();` and call `visitor.visitConcat(tree)` instead. – Pavel Smirnov Oct 17 '19 at 15:10
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/201066/discussion-between-vikram-and-pavel-smirnov). – ashok Oct 18 '19 at 04:59