0

I have the following regex:

(?i:outer|inner ){0,1}(?i:JOIN (tableC|tableG) AS )([a-zA-Z0-9\r\n ~()=<>#_'"-])*(?=(?i:JOIN|WHERE))

I am testing this against this SELECT statement:

SELECT columnA FROM tableB AS B
INNER JOIN tableC AS C ON B.fieldC = C.fieldC 
INNER JOIN tableD AS D on C.fieldD = D.fieldD
WHERE D.fieldE = C.fieldE

Now, I want the regex to match INNER JOIN tableC AS C ON B.fieldC = C.fieldC INNER. However, it matches until the first occurrence of WHERE. INNER JOIN tableC AS C ON B.fieldC = C.fieldC INNER JOIN tableD AS D on C.fieldD = D.fieldD

I am quite confused as I clearly stated in the regex to positively lookahead for a WHERE or a JOIN. However, it seems to be looking up until the last occurrence of any of these alternatives? Can anyone give insight on what's wrong and why is this happening to my regex?

Thanks!

Kawamoto Takeshi
  • 596
  • 1
  • 3
  • 24
  • After the occurrence of `WHERE` has been consumed, the lookahead will fail, hence your current observations. – Tim Biegeleisen Jan 23 '17 at 01:56
  • Hi @TimBiegeleisen, so does that mean that it will keep looking until it 'consumes' the first occurrences of /every/ word in my lookahead? – Kawamoto Takeshi Jan 23 '17 at 01:59
  • The lookahead asserts that, from that point in the regex, either JOIN or WHERE follows. I'm not sure you need a lookahead here. – Tim Biegeleisen Jan 23 '17 at 02:02
  • 2
    I think the problem is with `([a-zA-Z0-9\r\n ~()=<>#_'"-])*`. By default, when you say "match 0 or more of something", the regex matcher will start by looking at the _largest_ possible number of matches. So by working backwards, it finds the `WHERE` first. If you put `?` after `*` in the above, it will start with 0 and keep increasing to find the smallest number of matches, so that it sees the `JOIN` first. Try that and see if it works. – ajb Jan 23 '17 at 02:05
  • Hi @ajb, it now works as expected! I still really have to deep dive about regex. Thank you! – Kawamoto Takeshi Jan 23 '17 at 02:12
  • @ajb that's the answer: reluctant quantifier `*?` instead of greedy `*` – Bohemian Jan 23 '17 at 02:36

0 Answers0