0

Im making a regex expression using C# to extract the where clause of a sql expression. The whre clause can have multiple columns with different operators. Though NO grouping is allowed in the sql e.g.

col1 = 5 and (col3 = 6 or col4 < 5)

Only a simple format is allowed in the sql:

col1 = 5 and col1 < 6 or col3 <> ?

I have been trying to do it with the following , but the "and" and "or" keywords seem to be caught, and it doesnt capture all:

.*?(?<columnname>.+?)(?<operator>=|<|>|<>)(?<value>.+?)\s
H4mm3rHead
  • 553
  • 1
  • 12
  • 28

3 Answers3

2

Are you sure you don not want a complete parser that will save you a lot of time (i.e. the time others have put into it)?

Community
  • 1
  • 1
soulmerge
  • 73,842
  • 19
  • 118
  • 155
1

Break down your regex to see why it doesn't work.

.*?(?<columnname>.+?)(?<operator>=|<|>|<>)(?<value>.+?)\s
  • .*? — 0 or more characters, non-greedy.
  • (?<columnname>.+?) — 1 or more characters, which can be anything, non-greedy, capture into columnname.
  • (?<operator>=|<|>|<>) — One of =, <, >, or <>, capture into columnname
  • (?<value>.+?) — 1 or more characters, which can be anything, non-greedy, capture into value
  • \s — Must match a whitespace character here

The only restrictions here are what you expect to find in operator and the single whitespace at the end of the string. Are you feeding the entire query into the regex, or just the string you gave as an example?

If you match this against the example string you gave

col1 = 5 and col1 < 6 or col3 <> ?

you will find that you have a single space in columnname, then an equals sign in operator, then another single space and the number 5 in value.

Making this work as you wish is problematic because you might have spaces in your identifiers. Something like this might work:

.*?(?<columnname>`.+?`|\S+)\s*(?<operator>=|<|>|<>)\s*(?<value>`.+?`|\S+)

The only changes I made here are that columnname and value must either be all non-whitespace, or they must be values surrounded by backticks. Also, there are optional spaces between the captures, and not a required one at the end.

You might consider sharing your C# as well so that we can see how you are invoking the regex against the string, as given the regex will only match one column/operator/value set.

Adam Bellaire
  • 108,003
  • 19
  • 148
  • 163
0

Am I missing something here? "to extract the where clause of a sql expression" Why not simply capture whats between the where keyword and either group by or end of line:

where(.*?)(group|\Z)

ennuikiller
  • 46,381
  • 14
  • 112
  • 137
  • Should probably add order in there too, where(.?)(group|order|\Z) – Mark Sep 10 '09 at 14:31
  • Because im making a general solution for a particular source system (Navision) and the source columns are translated into other languages or renamed so almost no two sources are the same. I need to be able to replace the column names with the ones i know fits the particular solution – H4mm3rHead Sep 10 '09 at 14:33