3

I've been trying to build a function: concat('A','B') OR concat('A',9)

Here is a sample grammar I have written :

    LPAREN : '(' ;
    RPAREN : ')' ;
    FUNCTIONNAME : 'CONCAT' ;
    ARGUMENTS : TEXT (',' TEXT)* ;
    TEXT : ('a'..'z' | '0'..'9' | 'A'..'Z')+; 
    allFunction : FUNCTIONNAME LPAREN ARGUMENTS (',' ARGUMENTS)* RPAREN ;

But not able to build a tree properly.

Update1:

Here is the Tree:

  0 null
-- 11 CONCAT
-- 4 (
-- 13 2,5
-- 5 ) 

and the grammar :

allFunction : FUNCTIONNAME LPAREN ARGUMENTS RPAREN;

Update2 :

Grammar:

allfunction : COMMA | FUNCTIONNAME LPAREN ARGUMENTS (COMMA ARGUMENTS)* RPAREN ;

Parsed output:

CONCAT(A,B,C)

[@0,0:5='CONCAT',<8>,1:0]
[@1,6:6='(',<1>,1:6]
[@2,7:11='A,B,C',<9>,1:7]
[@3,12:12=')',<2>,1:12]
[@4,13:14='\n\n',<7>,1:13]
[@5,15:14='<EOF>',<-1>,3:0]

Update3 :

I have been tring to build a function : CONCAT(TEXT,TEXT) -(Input limited to 2 params). This works fine. I have implemented IF function : IF(TEXT,TEXT,TEXT) - This also works fine.

The problem is, I have to modify it to: IF(BOOLEAN,INT,INT) - But with existing grammar for any parameter in IF, it can accept UNSIGNED_INT including the first parameter.

Grammar:

Here is the link: https://ufile.io/undqs or https://files.fm/u/7c44aaee

Bond
  • 165
  • 2
  • 15
  • 1
    Please give an example of the input you want to parse. "not able" doesn't help much. Which errors do you have ? Which tokens are produced with `grun allFunction -tokens -diagnostics `? – BernardK Sep 19 '17 at 11:25
  • @BernardK Unable to set Concat as a root element. Also, 2,5 should be different nodes. – Bond Sep 19 '17 at 11:31
  • You have the same problem with ARGUMENTS as with STRUCTURE_SELECTOR in [link](https://stackoverflow.com/questions/46256834/how-to-make-antlr4-fully-tokenize-terminal-nodes/46258041#46258041). And again, what is the input file, is Tree the input or the output you want to build ? – BernardK Sep 19 '17 at 11:37
  • @BernardK As you see, [@2,7:11='A,B,C',<9>,1:7] – Bond Sep 19 '17 at 13:18

1 Answers1

6

You should not create a lexer rule ARGUMENTS. This is something the parse should handle. And the parameters should probably not be TEXT tokens, but some sort of expressions so that CONCAT(CONCAT(A, B), C) also works.

Something like this would be a good start:

grammar T;

parse
 : expression EOF
 ;

expression
 : expression 'AND' expression
 | expression 'OR' expression
 | function
 | bool
 | TEXT
 | NUMBER
 | TEXT
 | ID
 ;

function
 : ID '(' arguments? ')'
 ;

arguments
 : expression ( ',' expression )*
 ;

bool
 : TRUE
 | FALSE
 ;

TRUE         : 'true';
FALSE        : 'false';
NUMBER       : ( [0-9]* '.' )? [0-9]+;
ID           : [a-zA-Z_] [a-zA-Z0-9_]*;
TEXT         : '\'' ~[\r\n']* '\'';
SPACE        : [ \t\r\n]+ -> skip;

When parsing your input like this, you can simply parse any function that takes any parameter (of any type) an unknown amount of times. E.g. it will parse both CONCAT('a','b') and IF(false,1,42). But note that it will also parse IF(false,1,42,1,1,1,1,1,1,1,1,1,1). So after the parsing is finished, you can walk your parse-tree and validate that all the functions have the proper amount of parameters of the correct type.

Also, Is there any way to edit parse tree?

See: How to rewrite Antlr4 Parse Tree manually?

Bart Kiers
  • 166,582
  • 36
  • 299
  • 288
  • Thanx for the reply. When i try to build this grammar i get this error : error(210): The following sets of rules are mutually left-recursive [expression] – Bond Sep 19 '17 at 13:38
  • Yes, just looked it up: `error(210)` is an error from version 3 of ANTLR. That version will not work with the grammar I posted. Use version 4 instead. – Bart Kiers Sep 19 '17 at 14:08
  • That's correct @Bart. Just considering another scenario, What if I need to limit my functions input to Two parameters only. – Bond Sep 19 '17 at 16:07
  • Then just accept 2 expressions as argument: `function : FUNCTIONNAME '(' expression ',' expression ')' ;` – Bart Kiers Sep 19 '17 at 19:01
  • Can you explain the purpose of 'OR'? – Bond Sep 20 '17 at 09:01
  • You mentioned the `OR` token in your example: `concat('A','B') OR concat('A',9)`. It's like the logical OR: `||` in many programming languages. – Bart Kiers Sep 20 '17 at 09:02
  • Agreed. Can you please give input to Update 3 of Question? Thanx Again. – Bond Sep 21 '17 at 09:35
  • You should do as I suggested earlier: don't limit to `TEXT`, `BOOLEAN` or `INT`, but define an `expression` that matches all these things. Then when parsing is done, in a visitor or listener, perform semantic checks that validate that something is actually text, or number etc. – Bart Kiers Sep 21 '17 at 09:56
  • So, all we need to check is the number of parameters from grammar and while parsing tree, parameter types. Is it correct? – Bond Sep 21 '17 at 09:59
  • Also, Is there any way to edit parse tree? Because I don't much about visitor or listener. – Bond Sep 21 '17 at 10:05
  • Again @Mark thanks for your valuable input. I have parsing tree with my Custom listener. – Bond Sep 22 '17 at 13:13