2

We use the RoundhousE migration tool in one of our more SQL oriented projects. I have stubble upon a very strange bug

A certain SQL function script (Sadly I cant supply the script because its the property of my customer) makes the Replace method of RegEx never return

The regex looks like this

(?<KEEP1>^(?:[\s\t])*(?:-{2}).*$)|(?<KEEP1>/{1}\*{1}[\S\s]*?\*{1}/{1})|(?<KEEP1>'{1}(?:[^']|\n[^'])*?'{1})|(?<KEEP1>\s)(?<BATCHSPLITTER>GO)(?<KEEP2>\s)|(?<KEEP1>\s)(?<BATCHSPLITTER>GO)(?<KEEP2>$)

The code line in RounhousE that never returns

string sql_statement_scrubbed = regex_replace.Replace(sql_to_run, match => evaluate_and_replace_batch_split_items(match, regex_replace));

The problem is not in the delegate evaluate_and_replace_batch_split_items its in the actual regex.Replace method, i've tried the regex in a simple regex tool and it also hangs. Maybe someone here that are a guru on RegEx can see what the problem is?

edit: If I remove the ' (Apostrophe) from this sql comment -- If no previous, don't report revised it works, but its not only that it must be a combination of other text in the script because that line works on its own

Anders
  • 17,306
  • 10
  • 76
  • 144
  • It looks like the apostrophe in your comment is somehow interpreted as the beginning of a quoted string, and the regex takes forever to find the corresponding, ending quote for that "string". My regex (see below) should be less susceptible to the exponential increase in complexity that such a situation might bring, but it might still match the wrong text - therefore a question: Can there ever be a multiline string, i. e. a string that starts with `'` and ends with `'` but that contains newlines between the two? – Tim Pietzcker Nov 26 '12 at 11:51
  • Missed this comment, potentially you could have 'I dont't like this synstax' and with newlines – Anders Nov 27 '12 at 09:54
  • So in essence, invalid strings (because you can't have an unescaped apostrophe in a string that is itself delimited by apostrophes)? Then regular expressions will fail just as any language parser would fail with a Syntax Error. – Tim Pietzcker Nov 27 '12 at 10:01
  • Sorry Im tired, i meant '' (Escaped apostrophe), but it looks like your regex can handle that, the Unittests for RundhousE all returns green and my failing script works, thats good enough for me at least :D – Anders Nov 27 '12 at 10:10
  • To reliably handle escaped apostrophes, you do need a slightly more complex regex. I've edited my answer accordingly. – Tim Pietzcker Nov 27 '12 at 10:15

1 Answers1

2

Usually, when a regex takes forever to match (or, more likely, find out that it doesn't match) it's because of catastrophic backtracking. There are a few instances in your regex that might be prone to this, depending on what your input looks like. I have taken your regex and cleaned it up a little, removing lots of unnecessary quantifiers and alternations. This regex:

(?<KEEP1>^\s*--.*$)|(?<KEEP1>/\*[\S\s]*?\*/)|(?<KEEP1>'[^']*')|(?<KEEP1>\s)(?<BATCHSPLITTER>GO)(?<KEEP2>\s|$)

will match exactly the same as your old regex, but it's less complex and should be more stable. Please give it a try.

To handle escaped apostrophes within strings ('It\'s something else!') correctly, you would need to change the regex:

(?<KEEP1>^\s*--.*$)|(?<KEEP1>/\*[\S\s]*?\*/)|(?<KEEP1>'(?:\\.|[^'\\])*')|(?<KEEP1>\s)(?<BATCHSPLITTER>GO)(?<KEEP2>\s|$)
Tim Pietzcker
  • 328,213
  • 58
  • 503
  • 561