2

I'm trying to maintain someone else's code and would like to put them in order and fin out the relations between the files and the tables in the database. I have many ASP file, with most of them using VB script. I would like to extract the SQL queries from the files. My thought was to use regular expression, but I can't seem to find the correct pattern to use. (or maybe I'm just not searching with the correct key words?)

For example, I have the below code:

sql="select id from employee where no='"&no&"'"
rs.Open sql,smoothconn,adOpenStatic,adLockReadOnly 
if not rs.EOF then
  if sql1 <> "" then
    sql1 = sql1 & " or "
  end if     
  sql1 = sql1 & " (a.id='"&rs("id")&"')"
end if
rs.Close

chgdate = ""
sqlh = ""
sql="select a.chg, b.oldvalue, e.enter from employeechange a left join changeitem b on a.id=b.changeid " &_
  "left join employee e on a.id=e.id " &_
  "where (e.no like '"&no&"%' or e.name like '"&no&"%' or e.name1 like '"&no&"%') and b.field='id' " &_
  "order by a.chg "
'Response.Write sql & "<br>"    

I would like to get the SQL queries within this code. I would like to get at least the following:

select id from employee where no='"&no&"'

"select a.chg, b.oldvalue, e.enter from employeechange a left join changeitem b on a.id=b.changeid " &_
  "left join employee e on a.id=e.id " &_
  "where (e.no like '"&no&"%' or e.name like '"&no&"%' or e.name1 like '"&no&"%') and b.field='id' " &_
  "order by a.chg "

Basically, I need to know the action (INSERT/UPDATE/DELETE) and the tables involved.

Here are some things I think is important when generating the pattern:

1. can be INSERT, UPDATE, or DELETE
2. sql in string concatanation, and can be single-line or multi-line
3. no upper limits of number of tables involved (joined)
4. select can be * or specific column(s)

Here are some of the patterns but none of them seem to work correctly.

(SELECT|UPDATE|INSERT)[\s\S]+?\;\s*?$
(INSERT INTO|UPDATE|SELECT|DELETE)(?:[^;]|(?:'.*?'))+;\\s*
(SELECT|UPDATE|INSERT)[\s\S]+?\;\s*?$

I really appreciate the help.

sora0419
  • 2,308
  • 9
  • 39
  • 58
  • Common 'regex' engines are actually more powerful than recognizers for regular languages. In particular, PCRE engines recognize context-free languages and stray into the context-sensitive realm (see [this SO answer and its featured link](http://stackoverflow.com/a/11382641)). Moreover, there are are sql dialects which are context-free languages. In particular, [this grammar](http://savage.net.au/SQL/sql-99.bnf.html) for sql 99 is said to be context-free (which i have not verified). This needn't hold neither for the sql dialect you are working with nor for the embedding in vbscript / asp. Howev – collapsar May 19 '15 at 00:07
  • Regex can't actually parse arbitrary SQL, as it is not a regular language. The problem here is that you have quoted SQL, quotes within your SQL, and possibly quotes within your quotes within your quoted SQL. Regex can't tell what a quote means because it doesn't have the ability to keep track of how many quotes you've seen or what the recursive depth of the current quote-stack is. You'd need at least a context-free parser to do that. – Skylor Schermer May 18 '15 at 02:25

1 Answers1

-2

This colossus works for your sample input plus a delete and an insert:

/"(?:(select)(?:[^"]|"(?=[ \n\r\w]))*?from (\w+)(?:(?:[^"]|"(?=[ \n\r\w]))*?join (\w+))?(?:(?:[^"]|"(?=[ \n\r\w]))*?join (\w+))?(?:(?:[^"]|"(?=[ \n\r\w]))*?join(\w+))?(?:(?:[^"]|"(?=[ \n\r\w]))*?join (\w+))?|(update) (\w+)|(delete) from (\w+)|(insert) into (\w+)).*?"(?=[\n\r])/im

See live demo

It works by reluctantly matching between quotes, where the opening quote is followed by an sql command word and the closing quote is followed by a newline, and uses the "ignore case" flag (to match "SELECT" and "select") and "dot all" flag (and allow the match to flow across lines).

The SQL action/verb is the first non-blank group, and the individual tables affected are captured in the group(s) that immediately follow the verb. This handles up to 4 joins - just add more copies of the join match to enable more a greater number of joined tables to be captured.

Bohemian
  • 412,405
  • 93
  • 575
  • 722