0

I have attempted to use RegEx to partially validate Oracle SQL, but recently rediscovered a question about parsing HTML that made me doubt my choices.

I created the following stings (with help from SO and colleagues):

// C# code

// Find any semicolons with data afterwards
var match = Regex.Match(sql, @"^(.*;[ \t]*[\w\-;]+[\w\s\-]*$)", RegexOptions.Multiline);

// Find any multiline comments
match = Regex.Match(sql, @"/\*.*?\*/", RegexOptions.Singleline);

// Find any non-nested sql commands with a blank line inside them
match = Regex.Match(sql, @"(grant|(?<!\(\s*)select|insert|delete|update)\s+[^;]*(\r\n|\r(?!\n)|(?<!\r)\n)\s*(\r\n|\r(?!\n)|(?<!\r)\n)[^;]*;", RegexOptions.IgnoreCase);

These are implemented in an integration test suite that does a first-pass on all sql scripts to check for obvious errors - scripts that work in Oracle SQL Developer, but will not run using sqlplus in a linux environment.

My question is: are there any libraries or other techniques I should use instead - have I heeded the call of Cthulu or is my soul safe?

Community
  • 1
  • 1
Ruskin
  • 5,721
  • 4
  • 45
  • 62
  • whats your setting? is this part of an app? how difficult would it be to actually run the scripts in an uncommitted session? – Jafar Kofahi Jan 28 '15 at 13:47
  • It would be easier end less error-prone to take for instance [this SQLite grammar](https://github.com/antlr/grammars-v4/blob/master/sqlite/SQLite.g4) and rework it to match Oracle's syntax. With regex you'll have a hard time to validate nested queries (it's not impossible, only there are better ways). – Lucas Trzesniewski Jan 28 '15 at 13:53
  • We have a c# app that has various possible data layers. When a client using Oracle ran the installation scripts they had errors which we did not when using Oracle SQL Developer. These are implemented as unit / integration tests on build to catch the specific errors the client had. i.e. on the source sql scripts. – Ruskin Jan 28 '15 at 14:08

0 Answers0