0

I'm trying to parse an sql where I would like to get the where clause of the statement.

Below is the piece of code I have written:

string input = "select * from table where x = 5 and abc = 'p' or def = 1  order by col";
Match match = Regex.Match(input, @"select.*from [a-z]+ where(.*)(?:order by .*)?",RegexOptions.IgnoreCase);

But here the output I get includes the order by statement which I dont want. I get the expected output if I removed last '?', but the input statement might or might not contain order by.

Expected Output: " x = 5 and abc = 'p' or def = 1 "

can you please correct my regex

user1737619
  • 247
  • 1
  • 2
  • 15

4 Answers4

2

Add a first group with order clause then another without.

string input = "select * from table where x = 5 and abc = 'p' or def = 1  order by col";
Match match = Regex.Match(input, @"select.*from [a-z]+ where(?:(.*)(?:order by .*)|(.*))",RegexOptions.IgnoreCase);

Regex is not a good SQL parser and it will fail in many cases. For instance :

select * from table where x = 'order by col'

order by col' will be missing from the match.

Guillaume
  • 12,824
  • 3
  • 40
  • 48
1

Using a regular expression to parse SQL is a recipe for a king size headache. try this:

string input = "select * from table where x = 5 and abc = 'p' or def = 1  order by col";

string output = input.Substring(input.IndexOf(" where ", StringComparison.OrdinalIgnoreCase)+7, input.IndexOf(" order by "), StringComparison.OrdinalIgnoreCase).

Note: you will need to confirm that your sql actually contains both where and order by clauses, but it's fairly simple to do that

Note #2: it might be +6 and not +7, I didn't test the code.

Edit

It's worth mentioning that my suggested solution also suffers from the same drawbacks that Guillaume pointed out in his answer, i.e if the where clause will contain something like x = ' order by blabla' my suggestion will fail as well. However, it's fairly simple to avoid this, simply change input.IndexOf(" order by ", StringComparison.OrdinalIgnoreCase) to input.LastIndexOf(" order by ", StringComparison.OrdinalIgnoreCase). This way you can be sure to get the actual order by clause of your sql statement.

Zohar Peled
  • 79,642
  • 10
  • 69
  • 121
  • nope, c#. they do look alike and I had some Capitalization mistakes but I've fixed them now. – Zohar Peled Apr 13 '15 at 09:59
  • You should use `IndexOf("str", StringComparison.OrdinalIgnoreCase)` instead of `ToLower` https://msdn.microsoft.com/library/ms224425.aspx – Guillaume Apr 13 '15 at 12:06
  • @Guillaume You are correct, thanks. I've edited my answer once again according to your suggestion. – Zohar Peled Apr 13 '15 at 12:24
0

You can change the code to

string input = "select * from table where x = 5 and abc = 'p' or def = 1  order by col";
Match match = Regex.Match(input, @"select.*from [a-z]+ where(.*?)(?=\s+and|$)", RegexOptions.IgnoreCase);

and it will only capture the where clause up to the next and or the end of the query.

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
  • What if there isn't an `and`? Shouldn't you capture the `and` anyway? Also, you already have `RegexOptions.IgnoreCase` so `(?i)` is not needed here, I'm not sure why you've added it. – Kobi Apr 13 '15 at 08:54
  • I modified the question to have expected output and your output is only one condition of where clause not all – user1737619 Apr 13 '15 at 09:07
0

Try this pattern:

(?<where>(?<=where ).*)(?: order by)

Find the group "where" in the match return. Be sure to match other key words like having or group by.

Dani Ribas
  • 54
  • 3