2

I am trying to define a simple grammar for C which I am using in Lark. The problem is, I defined the closing parenthesis ("}" or ")") as a terminal in the grammar but it is throwing an error as "No terminal matches ')' in the current parser context". In many example grammar rules that I saw, the closing parenthesis was defined as terminals. How do I resolve this issue? Here is the code:

from lark import Lark
    g=r'''start : header_files
preprocessor_commands : "#include" | definition
definition : def string (header_files)* program
def : "#define" | "#undef" | "#ifdef" | "#ifndef" | "#if" | "#else" | "#elif" | "#endif" | "#error" | "#pragma"
header_files : preprocessor_commands file_names (header_files)* program
file_names : "<stdio.h>" | "<math.h>" | "<conio.h>" | "<stdlib.h>" | "<string.h>" | "<ctype.h>" | "<time.h>" | "<float.h>" | "<limits.h>" | "<wctype.h>"

program : data_type func "(" var ")" "{" (codeblock) close_par | data_type func "()" "{" (codeblock) close_par
data_type : "void" | "int" | "float" | "double" | "long" | "char" | "string" | "long long" | "unsigned_int"
func : "main" | string
var : string  

codeblock : "return" var term | "return" const term | "return" func term | "return" "(" expressions ")" term | declarations | expressions | statements | call | print 

declarations : data_type var assign const ";" (declarations)* codeblock | data_type var assign var ";" (declarations)* codeblock | data_type var ("," var)* ";" (declarations)* codeblock

expressions : arithmetic | bitwise | assignment

arithmetic : add | sub | mul | div | mod | unary
add : const "+" (arithmetic)* | var "+" (arithmetic)*
sub : const "-" (arithmetic)* | var "-" (arithmetic)*
mul : const "*" (arithmetic)* | var "*" (arithmetic)*
div : const "/" (arithmetic)* | var "/" (arithmetic)*
mod : const "%" (arithmetic)* | var "%" (arithmetic)*
unary : inc | dec
inc : "++" var | var "++"
dec : "--" var | var "--"

bitwise : and | or | xor | boc | ls | rs
and : const "&" (bitwise)* | var "&" (bitwise)*
or : const "|" (bitwise)* | var "|" (bitwise)*
xor : const "^" (bitwise)* | var "^" (bitwise)*
boc : var assign "~" const | var assign "~" var
ls : const "<<" const | var "<<" const
rs : const ">>" const | var ">>" const

assignment : assign | "*=" | "/=" | "%=" | "+=" | "-=" | "<<=" | ">>=" | "&=" | "^=" | "|="
assign : "="

statements : if | switch | loop

if : ("if" "(" logical close_par codeblock)+ ("elseif" "(" logical close_par codeblock)* ["else" codeblock]

logical : land | lor | lnot | equ | gre | les | greq | leeq | neq
land : const "&&" (logical)* | var "&&" (logical)*
lor : const "||" (logical)* | var "||" (logical)*
lnot : "!" (logical)+ | "!" (arithmetic)+ | "!" var | "!" const
equ : const "==" (logical)* | var "==" (logical)*
gre : const ">" (logical)* | var ">" (logical)*
les : const "<" (logical)* | var "<" (logical)*
greq : const ">=" (logical)* | var ">=" (logical)*
leeq : const "<=" (logical)* | var "<=" (logical)*
neq : const "!=" (logical)* | var "!=" (logical)*
 
switch : "switch" ("(" expressions ")") "{" (switch_case)* ["default" ":" codeblock] close_par
switch_case : "case" const ":" codeblock (switch_case)*

loop : for | while | do_while
for : "for" "(" [[data_type] var assign const] ";" [logical] ";" [arithmetic] ")" "{" codeblock "}" | "for" "(" [[data_type] var assign const] ";" [logical] ";" [arithmetic] ")" codeblock
while : "while" "(" logical ")" "{" codeblock "}" | "while" "(" logical ")" codeblock
do_while : "do" "{" codeblock "}" "while" "(" logical ")"

call : func "(" var ("," var)* ")" term | var assign func "(" var ("," var)* ")" term | func "(" ")" term | var assign func "(" ")" term

print : "printf" "(" (dcstring)* ")" term

%import common.SIGNED_NUMBER
const : SIGNED_NUMBER
term : ";" codeblock
tpar : ")" | "}" | ")" codeblock

digit : "0".."9"
nz_dig : "1".."9"
integer : (digit)* (nz_dig)+ | "-" (digit)* (nz_dig)+
decimal : (digit)+ "." (digit)+ | "-" (digit)+ "." (digit)+
letter : "a".."z" | "A".."Z"
char : letter | SIGNED_NUMBER
string : /[a-zA-Z0-9_.-]{2,}/ | (char)*
dcstring : /"[^"]*"/

close_par : "}" | ")" | "]"
WHITESPACE: " " | "\t" | "\f" | "\n"
%ignore WHITESPACE+

COMMENT: "//" /[^\n]/* | "/*" /(\S|\s)*?/ "*/"
%ignore COMMENT  
'''
parser=Lark(grammar=g,parser="earley")
code='''#include<stdio.h>
#define PI 3.14
void main()
{
  int a,b; long c=1;
  if(a==b || b==c)
  return 2;
}
'''
print(parser.parse(code).pretty())

This is the error :

enter image description here

ouflak
  • 2,458
  • 10
  • 44
  • 49
  • Please show your code. – Sumner Evans Nov 22 '21 at 20:18
  • @SumnerEvans I have updated the post. The rules might seem a bit lengthy. – ThunderLord Nov 22 '21 at 20:32
  • I defined the closing parenthesis in a rule named 'close_par' – ThunderLord Nov 22 '21 at 20:38
  • 1
    You might consider porting [this](https://github.com/antlr/grammars-v4/blob/master/c/C.g4) Antlr grammar to Lark. Try using the [Lark IDE](https://www.lark-parser.org/ide/) instead. Your grammar has many problems: incomplete and improperly handled preprocessor directives; non-standard expression grammar rules; useless parentheses surrounding a single grammar symbol; use of `[]`-operators instead of the `?`-operator; use of tail-recursion in a rule, e.g., `header_files`/`definition` each ref'ing `program`; confusing the widely-used term "string" with "identifier"). – kaby76 Nov 24 '21 at 17:22
  • 2
    I decided to port the Antlr4 C grammar to Lark since I'm writing conversions for many parser generators. The Lark version of the C grammar is [here](https://github.com/kaby76/Domemtech.TrashBase/blob/main/supported_grammars/lark/examples/C2.lark). Note, you really should preprocess the input .c file using "gcc -E" and use that as input of the parse. I haven't check this much, and I have noticed that Gnu extensions aren't correct in the original grammar. – kaby76 Nov 24 '21 at 23:32

0 Answers0