1

I have a class I use to "split" a string of SQL commands by a batch separator - e.g. "GO" - into a list of SQL commands that are run in turn etc.

...
private static IEnumerable<string> SplitByBatchIndecator(string script, string batchIndicator)
{
    string pattern = string.Concat("^\\s*", batchIndicator, "\\s*$");
    RegexOptions options = RegexOptions.Compiled | RegexOptions.IgnoreCase | RegexOptions.Multiline;
    foreach (string batch in Regex.Split(script, pattern, options))
    {
        yield return batch.Trim();
    }
}

My current implementation uses a Regex with yield but I am not sure if it's the "best" way.

  • It should be quick
  • It should handle large strings (I have some scripts that are 10mb in size for example)
  • The hardest part (that the above code currently does not do) is to take quoted text into account

Currently the following SQL will incorrectly get split:

var batch = QueryBatch.Parse(@"-- issue...
insert into table (name, desc)
values('foo', 'if the
go
is on a line by itself we have a problem...')");

Assert.That(batch.Queries.Count, Is.EqualTo(1), "This fails for now...");

I have thought about a token based parser that tracks the state of the open closed quotes but am not sure if Regex will do it.

Any ideas!?

Paul Kohler
  • 2,684
  • 18
  • 31
  • Now that is an evil problem, which is also giving me a headache. I'm working on https://github.com/timabell/sqlHawk which also needs to support batches. I'm half inclined to just say tough and that all the scripts must be split by GO on a line on it's own in uppercase and all variants will be ignored. Otherwise I'm down to full SQL parsing :-/ – Tim Abell May 24 '12 at 09:18
  • GO reference: http://msdn.microsoft.com/en-us/library/ms188037.aspx – Tim Abell May 24 '12 at 09:20
  • An alternative approach for .net people: use a different method of running the script: https://smehrozalam.wordpress.com/2009/05/12/c-executing-batch-t-sql-scripts-with-go-statements/ "2. Use the Server class from SQL Server Management Objects (SMO)" – Tim Abell May 24 '12 at 09:21
  • @Tim Abell - "tough" has been my approach to date! No batch counter or comments... Would probably get away with a partial token parser, quotes and comments for example but its not been a big enough issue for me yet :-) – Paul Kohler May 28 '12 at 02:55

1 Answers1

3

You can track the opening and closing quotes using a Balancing Group Definition.

Also, a similar question was asked last year about splitting on whitespace as long as the whitespace wasn't contained in quotes. You might be able to adjust those answers to get where you're going.

Community
  • 1
  • 1
E.Z. Hart
  • 5,717
  • 1
  • 30
  • 24
  • The 'balancing group definition' was close but not quite what I needed - the problem was you need pairs of quotes so it doesn't fit well with a commented out single quote in the SQL for example. Nice method none the less! – Paul Kohler Apr 24 '10 at 00:57