0

OK, this might seem too tough to be posted here so I beg your pardon. Been working on this for almost a week.

I need to extract all selected columns in a given Oracle SQL String. It should pass the following test cases:

// single column test
select col1 from dual
    // ^ should match "col1"

// multiple column test
select col1,col2 from dual
    // ^ should match "col1", "col2"

// multiple space test
select   col1   ,  col2   from   dual
    // ^ should match "col1", "col2"

// "distinct" tests
select distinct col1 from dual
    // ^ should match "col1"
select distinct col1, col2 from dual
    // ^ should match "col1", "col2"

// "distinct" with whitespaces tests
select   distinct   col1   from   dual
    // ^ should match "col1"
select   distinct   col1  ,  col2  from   dual
    // ^ should match "col1", "col2"

// "as" tests
select col1 from dual
    // ^ should match "col1"
select colA as col1 from dual
    // ^ should match "col1"
select colA as col1, col2, col3 from dual
    // ^ should match "col1", "col2", "col3"
select col1, colB as col2, col3 from dual
    // ^ should match "col1", "col2", "col3"
select col1, col2, colC as col3 from dual
    // ^ should match "col1", "col2", "col3"

// "as" tests with whitespaces tests
select    colA    as    col1,    colB    as    col2,    colC    as    col3    from    dual
    // ^ should match "col1", "col2", "col3"


// "distinct" with "as" tests
select distinct colA as col1 from dual
    // ^ should match "col1"
select distinct colA as col1, colB as col2, col3 from dual
    // ^ should match "col1", "col2", "col3"
select distinct colA as col1, col2, colC as col3 from dual
    // ^ should match "col1", "col2", "col3"


// function test
select funct('1','2') as col1 from dual
    // ^ should match "col1"
select col1, funct('1','2') as col2 from dual
    // ^ should match "col1", "col2"
select col1, colB as col2, funct('1','2') as col3 from dual
    // ^ should match "col1", "col2", "col3"

I tried the following RegEx in Java

 ((?<=select\ )(?!distinct\ ).*?(?=,|from))
 ((?<=select\ distinct\ ).*?(?=,|from))
 ((?<=as\ ).*?(?=,|from))
 ((?<=,\ ).*?(?=,|from))(?!.*\ as\ ) // <- Right, I'm guessing here

OR-ed them together but I can't simply pass all the test cases above. (I'm using this tool to validate my Regex).

I tried searching for SQL evaluator but can't find any that extracts all columns without executing it against a real database and that assumes all referenced tables and functions exist.

A Java ReGex, a free SQL Evaluator (that doesn't need a real database) that can pass the tests, or anything better that those two are the acceptable answers. The assumption is that the SQL is always in Oracle 11g format.

supertonsky
  • 2,563
  • 6
  • 38
  • 68
  • doesn't Oracle llg format also include the structure of `tablename.columnname` and using quotation marks to make column names with spaces, or do you know that your select statements have a more constrained syntax? – Peter Elliott Jan 15 '13 at 05:07
  • Yes, I do know that it's supposed to have more than the given tests. I just put it there to give an idea about what I'm intending to do. I'm guessing that if somebody could give a regex that passes on those test cases, it might be possible for me to derive solutions from it to pass all the remaining test cases not specified here. – supertonsky Jan 15 '13 at 05:11

1 Answers1

1

Given that Oracle SELECT lists can get pretty complex (having to account for all of the cases you mention, plus subqueries, tablename.columnname constructs, quoted aliases, etc), you probably want to go beyond Regular Expressions and actually parse the SQL query then pull the tokens out of the parsed output.

To that end, you have a couple of different options, none of which are all that easy, but may be able to solve your problem

  • If you're willing to use Perl, you can probably make SQL::Parser do what you want.
  • You can get a 90 free trial download of gsqlparser if you want a java-based solution, which would be helpful if this is a one-time project.
  • There is this - SQL92 parser, which is a free download but of unknown license, and I'm not totally sure if it can handle any Oracle-specific weirdness.
  • you can use Antlr to generate a SQL parser with a java interface based on this guy's work, which is based on CREATE TABLE syntax but can be adapted readily to handle SELECT syntax (or you can search for antlr sql grammar and find a premade one pretty easily)
Peter Elliott
  • 3,273
  • 16
  • 30
  • I guess this is the best answer I could possibly get. I was thinking of a JDBC driver that accepts Oracle SQL that can execute SQL without having a real database connection and would return a ResutlSet. From there I could possibly get all column names. I googled and found out Hibernate is using ANTLR. Maybe I should try my luck in there. Much appreciated. – supertonsky Jan 15 '13 at 05:55