0

I have to unit test scripts for a postgres(PG) derivative dialect called AWS redshift that has a few features that are not backward compatible with PG. I am unit testing redshift SQL code on PG by lexing out the unwanted things from it using regexes. At the moment, reg ex complexity is just about manageable(DISTKEY SORTKEY, DISTSTYLE AND ENCODE). On the request of my fellow code reviewers on the complexity of the regexes and their brittleness, I started looking at alternative viable approaches and I stumbled upon using ANTLR. The idea would be to generate and use a lexer instead of my regexes. Here are a few general questions you could help me with when you read the sample plan at the end of this thread.

  • Is there a simpler approach to the plan which I outline below? Should I stick to my regex plan for the moment as a good compromise?
  • If using ANTLR for the lexer is indeed doable then please read on and help me answer some of the questions that come below.

I have looked at simple example in this thread for inspiration, but PG is a big beast. So instead I am trying to do some effort estimation for this; This exercise could be valuable to people if they are estimating the "bang for the buck" for taking this route.

I have a sample plan below that represents the bill of work (Assuming I use ANTLR for being language agnostic):

  1. Borrow some ANTLR Grammar for PG. I came across this ANTLR PG git project. I am getting started with installing and running it now. Is the code in that project sufficient for my purposes? What is that code missing? ....

  2. Add Redshift specific words. This should not be too hard

  3. Generate all the visitors and listener

  4. Implement the visitors(?)

    Could I do something simple to just return the parsed tokens in the visitors. Because I only

    want to lex out the Redshift specific things?

  5. Lex out redshift in the listener to by(Is this just returning something like a "" string when a redshift specific token is encountered?)

  6. Run against some sample SQL code to test it will lex out redshift things and I can test the resulting redshift code on a PG instance.

Community
  • 1
  • 1
Sid
  • 420
  • 1
  • 6
  • 11
  • 2
    If your goal is to use Postgres to testrun the redshift-scripts (ignoring the extensions) , why not extend the PG syntax (it is fully readable Bison/Flex) to make it accept the extensions (and do nothing with them) ? – wildplasser Nov 19 '14 at 19:30
  • The thought had crossed my mind. The only reason I wanted to try use ANTLR was because it provided a language independent hooks and my code is in python. – Sid Nov 19 '14 at 19:43
  • PostgreSQL uses bison and flex as the parser generator, so that seems like a no-brainer place to start. – Craig Ringer Nov 20 '14 at 00:20

0 Answers0