I have to unit test scripts for a postgres(PG) derivative dialect called AWS redshift that has a few features that are not backward compatible with PG. I am unit testing redshift SQL code on PG by lexing out the unwanted things from it using regexes. At the moment, reg ex complexity is just about manageable(DISTKEY SORTKEY, DISTSTYLE AND ENCODE). On the request of my fellow code reviewers on the complexity of the regexes and their brittleness, I started looking at alternative viable approaches and I stumbled upon using ANTLR. The idea would be to generate and use a lexer instead of my regexes. Here are a few general questions you could help me with when you read the sample plan at the end of this thread.
- Is there a simpler approach to the plan which I outline below? Should I stick to my regex plan for the moment as a good compromise?
- If using ANTLR for the lexer is indeed doable then please read on and help me answer some of the questions that come below.
I have looked at simple example in this thread for inspiration, but PG is a big beast. So instead I am trying to do some effort estimation for this; This exercise could be valuable to people if they are estimating the "bang for the buck" for taking this route.
I have a sample plan below that represents the bill of work (Assuming I use ANTLR for being language agnostic):
Borrow some ANTLR Grammar for PG. I came across this ANTLR PG git project. I am getting started with installing and running it now. Is the code in that project sufficient for my purposes? What is that code missing? ....
Add Redshift specific words. This should not be too hard
Generate all the visitors and listener
Implement the visitors(?)
Could I do something simple to just return the parsed tokens in the visitors. Because I only
want to lex out the Redshift specific things?
Lex out redshift in the listener to by(Is this just returning something like a "" string when a redshift specific token is encountered?)
Run against some sample SQL code to test it will lex out redshift things and I can test the resulting redshift code on a PG instance.