I have attempted to use RegEx to partially validate Oracle SQL, but recently rediscovered a question about parsing HTML that made me doubt my choices.
I created the following stings (with help from SO and colleagues):
// C# code
// Find any semicolons with data afterwards
var match = Regex.Match(sql, @"^(.*;[ \t]*[\w\-;]+[\w\s\-]*$)", RegexOptions.Multiline);
// Find any multiline comments
match = Regex.Match(sql, @"/\*.*?\*/", RegexOptions.Singleline);
// Find any non-nested sql commands with a blank line inside them
match = Regex.Match(sql, @"(grant|(?<!\(\s*)select|insert|delete|update)\s+[^;]*(\r\n|\r(?!\n)|(?<!\r)\n)\s*(\r\n|\r(?!\n)|(?<!\r)\n)[^;]*;", RegexOptions.IgnoreCase);
These are implemented in an integration test suite that does a first-pass on all sql scripts to check for obvious errors - scripts that work in Oracle SQL Developer, but will not run using sqlplus in a linux environment.
My question is: are there any libraries or other techniques I should use instead - have I heeded the call of Cthulu or is my soul safe?