0

I am doing some string parsing on SQL queries and have come to a bit of a crossroads on identification and extraction of nested queries. In the case of:

select name, wins
from mlb_team
where wins > (select avg(wins) from mlb_team) and league = 'NL'
order by wins desc

I would like to extract (select avg(wins) from mlb_team). A more global issue arises in the case of a nested query which contains the keyword that follows the keyword from the parent query of the nested query. i.e.

select columns
from table
where column_val > (select avg(column) from table where conditional order by column asc) and league = 'NL'
order by wins desc

The nested query contains the order by clause as does the parent, so searching for that keyword alone does not work to make this code more universal. I would like to be able to put in any query and have it extract a nested query.

Tips or suggestions would help. I'm currently using a lot of regex to pull clauses out and single table queries process fine (i.e. I'm getting each individual clause by itself) but with nested queries I am hitting a wall.

Thanks guys!

1 Answers1

0

In your case you could work with this RegEx:

\(select.+(desc|asc|\)|$)

With a recursive function you could parse further subqueries.

BUT i would highly recommend to NOT parse SQL with regular expression and as @RC suggested use an already build SQL parser. You open a never ending story, in which you will always develop on your parser. (Depending on the DBMS or ANSI SQL or or or, just dont do it)

deterministicFail
  • 1,271
  • 9
  • 28
  • 1
    I would use an SQL parser except this is for a tool a professor and I are building as part of a research project. We made the choice of coding it from the ground up so it can be maintained by students down the road since the tool we are building is meant for classroom teaching. We don't want to rely on outside code to be maintained well and up to date with what we always need so we are choosing to write everything from scratch. Thank you for that regex as well. The caveat is within a query, there may be parenthesis in string literals and I don't want it to grab these as the end parenthesis. – Nicholas Pierce Dec 01 '15 at 20:48