0

Please consider this string:

string mySelects = @"SELECT * FROM TABLE1 WHERE 1=1
SELECT T.* 
FROM (SELECT * FROM TABLE2 WHERE 2=2) T
ORDER BY FOO
SELECT FIELD1, FIELD3
FROM TABLE3
GROUP BY FIELD";

Is there any regular expression to return a MatchCollection with these three Selects in each result?

Thanks in advance!

Riera
  • 369
  • 1
  • 2
  • 12
  • 7
    For such analysis you better use a SQL parser... regex is not the right tool... – Yahia Oct 29 '13 at 15:27
  • 3
    Regexes are good for parsing languages without context, sql is more complicated. You will not be able to find a regex which gives you a good result in all cases. – Peter Oct 29 '13 at 15:29
  • I think it might be quite simple: give me all 'Select .*' expressions before another 'select' or end of string, ignoring 'select .*' between brackets. Any help? – Riera Oct 29 '13 at 16:26

2 Answers2

2

There is no regex like that, and it is not possible to write one: the regex programming model is not powerful enough to deal with counting. It is not powerful enough even to balance parentheses in an expression - something that would be required to capture the third select in your example. The fact that .NET's regex engine supports balancing groups is not going to help you much: regex is hopelessly inadequate for the task at hand.

You need something more powerful than the regex - preferably, a complete SQL parser. I have tried this parser (registration is required), it worked fine on many queries. It also has a source in case you wish to adjust it to your needs.

Community
  • 1
  • 1
Sergey Kalinichenko
  • 714,442
  • 84
  • 1,110
  • 1,523
  • -1 Because of [balancing groups](http://msdn.microsoft.com/en-us/library/bs2twtah.aspx#balancing_group_definition). It's one of the reasons why I downvoted the other answer; you guys are getting tied up in the formal definition of regular languages when C# regexes aren't at all regular. – NullUserException Oct 29 '13 at 15:47
  • 1
    @NullUserException +1 for dasblinkenlight. The whole question is regarding parsing SQL and no *regular expression engine* is powerful enough to do so. I'm certain, that OP doesn't want to dwell into formal definitions of regular expression and grammars, so we included just enough theory for him to understand, why what is he trying to do is impossible. – Spook Oct 29 '13 at 16:16
  • @dasblinkenlight To be honest, I have not. But I'm just pointing out there are features in C#'s regexes that make it formally non-regular. BTW I've removed the downvote. – NullUserException Oct 29 '13 at 16:51
1

There is none.

Mathematically speaking, a regular expression describes some set of strings (these, which match this regular expression). Similarly a grammar describes a set of strings (which match this grammar). But grammars are a superset over regular expressions, because every regular expression can be described by a grammar, but unfortunately not the other way around.

SQL is a language, which is described by some grammar. But this grammar is way too complicated to be described by a regular expression. You need some more mathematical firepower to process this language.

The solution is to seek for a ready-made SQL parser or write one using tools like ANTLR.

Spook
  • 25,318
  • 18
  • 90
  • 167
  • @dasblinkenlight What do you mean by a "spite vote"? – Spook Oct 29 '13 at 15:37
  • @dasblinkenlight I see. Tough luck :) – Spook Oct 29 '13 at 15:43
  • -1 for a couple of reasons. First because *"a grammar is a superset of a regular expression"* is a statement that doesn't make any sense. A formal grammar describes a formal language. This language can be [regular](http://en.wikipedia.org/wiki/Regular_grammar). Also, no modern regex dialect is truly regular. C# regexes certainly aren't regular. – NullUserException Oct 29 '13 at 15:45
  • Note this isn't an endorsement for using regexes for this job, they aren't a great tool for this if you're starting from scratch (though you should keep in mind ANTLR's lexer uses regexes). I'm merely pointing out the answer contains technically inaccurate statements. If you fix those I'll remove the downvote (cc @dasblinkenlight) – NullUserException Oct 29 '13 at 15:51
  • @NullUserException I rephrased the answer to make it more clear. The overall point is, that no regular expression engine (even with some extensions) is powerful enough to process a SQL grammar. Note, that even previously I noted, that there *are* regular grammars (these, which match regular expressions). BTW, Please consider writing comments for downvotes in the future - if there are some uncertainties, we will be able to fix our answers way faster. – Spook Oct 29 '13 at 16:15
  • @dasblinkenlight You can easily check changes in reputation of the OP and there were no downvotes for our answers :) – Spook Oct 29 '13 at 16:17
  • Where you said "grammar" I think you really meant "non-regular grammar" (eg: [context-free](http://en.wikipedia.org/wiki/Context-free_grammar), [context-sensitive](http://en.wikipedia.org/wiki/Context-sensitive_grammar), etc) as a grammar *can* describe a [regular language](http://en.wikipedia.org/wiki/Regular_grammar). The wording of this answer still doesn't sound quite right, but I'll remove the downvote. – NullUserException Oct 29 '13 at 16:50