Antlr4 Performance Issue for high volume files

Question

We are facing issues with antlr performance for parsing Oracle files. The oracle files for conversion has significant large size 17/24/38 mb files. While building the parse-tree, it taking a lot of time and memory. Its even giving core OOM dump. We tried disabling the building of parse tree, however that does not work as the walker doesn’t go thru the files and generates a blank file. We tried to use BufferedInputStream in place of FileInoutStream. We even tried to use BufferedTokenStream, UnbufferedCharStream, UnbufferedTokenStream, instead of the other respective or equivalent stream for parser and lexer. None of the options are working and the parse tree is taking a lot of memory and time to be generated and traversed. We tried running with 2 Gigs of heap memory as well, however it goes beyond that and gives core OOM dump .

From the online forums it seems this is a very common problem in Antlr when it tries to parse large input files. As alterative it either suggests to break down the input files into multiple small files . It also says that we can leave aside listeners and visitors and create objects directly in the grammar and use hashmaps/vectors.

Are there any good examples to references where setBuildParseTree = false? Can ANTLR4 java parser handle very large files or can it stream files Is it possible to parse big file with ANTLR?

Have you encountered any such Antlr problems in the past and if yes how was it handled? Any suggestions that would help in reducing the memory footprint and make the performance faster specific to Antlr?

The input file mostly contains selects and insert statements. but these files are large volume.

INSERT INTO crmuser.OBJECT_CONFIG_DETAILS(
ATTRIBCONFIGID,OBJCONFIGID,ATTRIBNAME,PARENTNAME,ISREQUIRED
,ISSELECTED,READACCESS,WRITEACCESS,DEFAULTLABEL,CONFIGLABEL
,DATATYPE,ISCOMPOSITE,ISMANDATORY,ATTRIBSIZE,ATTRIBRANGE
,ATTRIBVALUES,ISWRITABLE)
VALUES ( 
91933804, 1682878, 'ACCOUNTS_EXTBO.RELATIVE_MEMBER_ID', 'ACCOUNTS_EXTBO', 
'N', 'Y', 'F', 'F', 'ACCOUNTS_EXTBO.RELATIVE_MEMBER_ID', 
'ACCOUNTS_EXTBO.RELATIVE_MEMBER_ID', 'String', 'N', 'N', 50, 
null, null, 'N')
;

INSERT INTO crmuser.OBJECT_CONFIG_DETAILS(
ATTRIBCONFIGID,OBJCONFIGID,ATTRIBNAME,PARENTNAME,ISREQUIRED
,ISSELECTED,READACCESS,WRITEACCESS,DEFAULTLABEL,CONFIGLABEL
,DATATYPE,ISCOMPOSITE,ISMANDATORY,ATTRIBSIZE,ATTRIBRANGE
,ATTRIBVALUES,ISWRITABLE)
VALUES ( 
91933805, 1682878, 'ACCOUNTS_EXTBO.ELIGIBILITY_CRITERIA', 'ACCOUNTS_EXTBO', 
'N', 'Y', 'F', 'F', 'ACCOUNTS_EXTBO.ELIGIBILITY_CRITERIA', 
'ACCOUNTS_EXTBO.ELIGIBILITY_CRITERIA', 'String', 'N', 'N', 50, 
null, null, 'N')
;

I'm running into the same issue. I'm trying to consider a way to automatically break my large file into "sub tasks". Generally Once I reach a certain point in the file, I'm looking at individual ASCII records which all parse to a subset of the rules. I could flush the parse tree at the end of each of these records. I'm translating data. I'm trying to avoid moving my LIstener functions back into the grammar. — Ross Youngblood, Feb 25 '17 at 01:32

Antlr4 Performance Issue for high volume files

0 Answers0