0

I am trying to select every query with SELECT in the start and LIMIT in the end, for example:

$SelQuery = "SELECT DISTICNT C.firstname,C.lastname,C.id as clientid,QC.category_name,QR.cid,QR.catid,QR.rhid
                    FROM cms_question_report QR, cms_clients C,cms_questioncategory QC ,cms_reporthistory RH
                    WHERE C.id=QR.cid
                    AND QR.rhid=RH.id
                    AND QR.catid='".$objArray['catid']."'
                    AND QR.catid=QC.id
                    AND C.id IN($SelClids)
                    LIMIT $page_no,$this->Limit";

And not select any other query that doesn't contain LIMIT. So the regex I thought would work was this:

SELECT .*\n* LIMIT

Where I'm basically stating that it has to start with "SELECT" then any other characters and ends with LIMIT, the only issue is I don't know how to handle the \n*, I have multiple unknown number of new lines and in each line there are unknown characters, so I'm assuming it should go like this \n*.\n.* and so on, but there definitely is a way where I can say select any character including a new line.

I have tried using this solution: Regex to match any character including new lines and adjusted my regex to:

SELECT /s* LIMIT

but that didn't work as well.

Also tried:

SELECT (.|\n)* LIMIT

but it didn't work.

Community
  • 1
  • 1
Naguib Ihab
  • 4,259
  • 7
  • 44
  • 80
  • 2
    Don't parse *SQL* with *regexes*. SQL is a recursive language, and can only be parsed properly with a *context free grammar*... – Willem Van Onsem Apr 10 '15 at 01:18
  • I'm sorry I don't understand what you're referring to. What would SQL being a recursive language have anything to do with this? Regex would see this as plain text, running regex on any kind of text wouldn't make any difference. Correct? – Naguib Ihab Apr 10 '15 at 01:59
  • It seems you are processing source code to extract some queries. Are you using grep? – 1010 Apr 10 '15 at 03:09
  • @1010 I have no idea what you're talking about, but I i'm using this regex in sublime to find these queries and replace them afterwards. – Naguib Ihab Apr 10 '15 at 03:16
  • @NaguibIhab: lets answer that with an inception joke: "*Yo dawg! I heard you like sql queries, so we've put an sql query in an sql query so you can query while you query*"... – Willem Van Onsem Apr 10 '15 at 11:29
  • For what it's worth, your attempt (SELECT (.|\n)* LIMIT) works for me (in TextPad) if you just remove the space between the asterisk and LIMIT (otherwise the match stops at the first occurrence of LIMIT. – Mark Leiber Apr 10 '15 at 13:01

1 Answers1

2

As @CommuSoft commented SQL queries have a structure that cannot be correctly recognized using regular expressions.

But if you just want to find in your source code all the strings containing queries beginning with SELECT and having the word LIMIT and allowing EOLs between quotation marks, you could use something like this:

\"SELECT([^"]|\n|\"[^";]*\")*LIMIT([^"]|\n|\"[^";]*\")*\";

DEMO here

This regex will match a string delimited with double quotes with the words SELECT and LIMIT, and allowing in between zero or more

  • characters that are not double quotes
  • \n
  • a pair of double quotes with any character in between but ;

This may do what you need, but it depends on what your code looks like. For example if you build your query catenating variables it won't work.

Note that depending on the operating system the end of line mark can be \r\n or \r.

Community
  • 1
  • 1
1010
  • 1,779
  • 17
  • 27
  • Thank you for your answer. I don't understand this alternative: \"[^";]*\", from what I understand from that is you're looking for something like that "" and not that "";" Why would I seek this in the query? – Naguib Ihab Apr 20 '15 at 03:15
  • @NaguibIhab your queries are constructed catenating string literals between double quotes and other expressions. that part looks for things like `".$objArray['catid']."` that cannot habe ; because it would end the line. – 1010 Apr 20 '15 at 03:33
  • note that `[^";]` doesn't match `\n`, so catenation expressions that include \n (that is outside quotation marks) will not be matched. – 1010 Apr 20 '15 at 03:38