9

I'm trying to come up with a regular expression to remove comments from an SQL statement.

This regex almost works:

(/\*([^*]|[\r\n]|(\*+([^*/]|[\r\n])))*\*+/)|'(?:[^']|'')*'|(--.*)

Excepth that last part doesn't handle "--" comments very well. The problem is handling SQL strings, delimited with ''.

For example, if i have

SELECT ' -- Hello -- ' FROM DUAL

It shouldn't match, but it's matching.

This is in ASP/VBscript.

I've thought about matching right-to-left but i don't think the VBScript's regex engine supports it. Also tried fiddling with negative lookbehind but the results weren't good.

casperOne
  • 73,706
  • 19
  • 184
  • 253
Nuno Leong
  • 91
  • 1
  • 1
  • 3

9 Answers9

7

In PHP, i'm using this code to uncomment SQL:

$sqlComments = '@(([\'"]).*?[^\\\]\2)|((?:\#|--).*?$|/\*(?:[^/*]|/(?!\*)|\*(?!/)|(?R))*\*\/)\s*|(?<=;)\s+@ms';
/* Commented version
$sqlComments = '@
    (([\'"]).*?[^\\\]\2) # $1 : Skip single & double quoted expressions
    |(                   # $3 : Match comments
        (?:\#|--).*?$    # - Single line comments
        |                # - Multi line (nested) comments
         /\*             #   . comment open marker
            (?: [^/*]    #   . non comment-marker characters
                |/(?!\*) #   . ! not a comment open
                |\*(?!/) #   . ! not a comment close
                |(?R)    #   . recursive case
            )*           #   . repeat eventually
        \*\/             #   . comment close marker
    )\s*                 # Trim after comments
    |(?<=;)\s+           # Trim after semi-colon
    @msx';
*/
$uncommentedSQL = trim( preg_replace( $sqlComments, '$1', $sql ) );
preg_match_all( $sqlComments, $sql, $comments );
$extractedComments = array_filter( $comments[ 3 ] );
var_dump( $uncommentedSQL, $extractedComments );
Adrien Gibrat
  • 917
  • 10
  • 13
  • This is stellar but I didn't like the trimming at the end, because it can remove newlines that might actually be desirable/necessary (as when an inline comment after code has no space before it... people do this :| ). Also added backticks to the quote list. So I'm using: $sqlComments = '@(([\'"`]).*?[^\\\]\2)|((?:\#|--).*?$|/\*(?:[^/*]|/(?!\*)|\*(?!/)|(?R))*\*\/)+@ms'; – dkloke Dec 07 '14 at 05:55
  • This regexp segfaults (php 5.6) or returns NULL (php 7+) on queries with long comments at the beginning, eg ` /* put here 8kb of dummy text */ SELECT 1; ` – Ondřej Hlaváček Aug 12 '17 at 02:44
  • I ran around 120k queries through this regexp and it has some major flaws in detecting comments in the middle of a query. Eg properly encapsulated strings containing "--" (double dash string) are removed. – Ondřej Hlaváček Aug 15 '17 at 18:31
  • Testing it on https://regex101.com says: `Your pattern contains one or more errors: * Character range is out of order * Unmatched parenthesis` – rapt Dec 28 '17 at 23:01
  • @rapt, it's because of the php escape 3x`\` in '[^\\\]'... if you want to use the regexp outside php: (([\'"]).*?[^\\]\2)|((?:\#|--).*?$|/\*(?:[^/*]|/(?!\*)|\*(?!/)|(?R))*\*\/)\s*|(?<=;)\s+ – Adrien Gibrat Feb 21 '18 at 08:05
  • does not works if on the end there is opened and not closed comment using /* – Diego Favero Feb 17 '22 at 01:42
4

This code works for me:

function strip_sqlcomment ($string = '') {
    $RXSQLComments = '@(--[^\r\n]*)|(\#[^\r\n]*)|(/\*[\w\W]*?(?=\*/)\*/)@ms';
    return (($string == '') ?  '' : preg_replace( $RXSQLComments, '', $string ));
}

with a little regex tweak it could be used to strip comments in any language

gonzalezea
  • 472
  • 3
  • 7
3

Originally, I used @Adrien Gibrat's solution. However, I came across a situation where it wasn't parsing quoted strings, properly, if I had anything with a preceding '--' inside of them. I ended up writing this, instead:

'[^']*(?!\\)'(*SKIP)(*F)       # Make sure we're not matching inside of quotes
|(?m-s:\s*(?:\-{2}|\#)[^\n]*$) # Single line comment
|(?:
  \/\*.*?\*\/                  # Multi-line comment
  (?(?=(?m-s:\h+$))         # Get trailing whitespace if any exists and only if it's the rest of the line
    \h+
  )
)

# Modifiers used: 'xs' ('g' can be used as well, but is enabled by default in PHP)

Please note that this should be used when PCRE is available. So, in my case, I'm using a variation of this in my PHP library.

Example

Erutan409
  • 730
  • 10
  • 21
2

As you said that the rest of your regex is fine, I focused on the last part. All you need to do is verify that the -- is at the beginning and then make sure it removes all dashes if there are more than 2. The end regex is below

(^[--]+)

The above is just if you want to remove the comment dashes and not the whole line. You can run the below if you do want everything after it to the end of the line, also

(^--.*)
Justin Pihony
  • 66,056
  • 18
  • 147
  • 180
  • Hi Justin... thanks for the help. There still remains the problem with inline comments that don't start in the beginning. Like SELECT ' -- Hello -- ' FROM DUAL -- comment that should be removed – Nuno Leong Mar 14 '12 at 14:04
  • No problem, and welcome to stack overflow. Please remember that the way of showing appreciation here is through upvotes and accepted answers (checkmark next to answer). More information can be found in the [FAQ], especially [FAQ#HowToAsk] How To Ask – Justin Pihony Mar 14 '12 at 14:09
1

remove /**/ and -- comments

function unComment($sql){

        $re = '/(--[^\n]*)/i';
        $sql = preg_replace( $re, '', $sql );

        $sqlComments = '@(([\'"]).*?[^\\\]\2)|((?:\#|--).*?$|/\*(?:[^/*]|/(?!\*)|\*(?!/)|(?R))*\*\/)\s*|(?<=;)\s+@ms';
        $uncommentedSQL = trim( preg_replace( $sqlComments, '$1', $sql ) );
        preg_match_all( $sqlComments, $sql, $comments );
        $sql = preg_replace('/[\x00-\x1F\x80-\xFF]/', '', trim($uncommentedSQL));


        return $sql;
    }
Mansour Alnasser
  • 4,446
  • 5
  • 40
  • 51
0

Please see my answer here. It works both for line comments and for block comments, even nested block comments. I guess you need to use regex with balancing groups, which AFAIK is not available in VBScript.

Community
  • 1
  • 1
drizin
  • 1,737
  • 1
  • 18
  • 44
0

For Node.js, see pg-minify library. It works with PostgreSQL, MS-SQL and MySQL scripts.

It can handle all types of comments, plus compress the resulting SQL to its bare minimum, to optimize what needs to be sent to the server.

vitaly-t
  • 24,279
  • 15
  • 116
  • 138
0

Three regular expressions in an array for a preg_replace is just as fast as a single complex expression. Example for PHP:

function removeSqlComment($sqlString){
    $regEx = [
      '~(?:".*?"|\'.*?\')(*SKIP)(*F)|--.*$~m',
      '~(?:".*?"|\'.*?\')(*SKIP)(*F)|/\*.*?\*/~s',
      '~^;?\R~m'
    ];
    return trim(preg_replace($regEx, '', $sqlString));
}

//test
$sqlWithComment = <<<SQL
-- first Comment;
Delete * from /* table1 */table where s = '--'; -- comm2
/*
 * comment 3
 */
SELECT ' -- Hello -- ' FROM DUAL;
SQL;

$sql = removeSqlComment($sqlWithComment);

$expected = "Delete * from table where s = '--'; \nSELECT ' -- Hello -- ' FROM DUAL;";

var_dump($sql === $expected);  //bool(true)
jspit
  • 7,276
  • 1
  • 9
  • 17
-1

For all PHP folks: please use this library - https://github.com/jdorn/sql-formatter. I have been dealing with stripping comments from SQL for couple years now and the only valid solution would be a tokenizer/state machine, which I lazily resisted to write. Couple days ago I found out this lib and ran 120k queries through it and found only one bug (https://github.com/jdorn/sql-formatter/issues/93), which is fixed immediately in our fork https://github.com/keboola/sql-formatter.

The usage is simple

$query <<<EOF
/* 
  my comments 
*/
SELECT 1;
EOF;

$bareQuery = \SqlFormatter::removeComments($query);
// prints "SELECT 1;"
print $bareQuery;