2

I need to get content of executable comments in mysqldump results, but for regexp

/\/\*\!\d+\s+(.*?)\*\//s

and input data like this:

/*!50003 text
some text else
/*
comment
also comment
*/
text...
and also text...
*/

I get wrong result because it get data only from "text" to "also comment" lines. How can I to skip comment into comment? Thanks.

UPD: I cannot use "^" and "$" to mark start and end of input, because I have a lot ot executable statements in input.

UPD2: Output I want:

text
some text else
/*
comment
also comment
*/
text...
and also text...

NOT all input how in comment below. It's very strange, I think, get the same output as input.

UPD3: Start of executable comment must be /*!ANYNUMBER. It must be skipped and not included in output. End of executable comment is simply */ Right output example is presented in "UPD2".

Guy Fawkes
  • 2,313
  • 2
  • 22
  • 39
  • Do you want to "unwrap" the outermost comment for all comments, stripping certain parts of the comment, or do you only want to do it for comments beginning with a certain pattern? What is the exact pattern for the first part? Is it exactly "!50003 text", or can the number be anything? What's the exact pattern for the "text" portion? Is it a single word? Everything until the new line? What's the [overall goal](http://meta.stackexchange.com/questions/66377/what-is-the-xy-problem)? – outis Jan 14 '12 at 11:52
  • Sure, I asked about "common" problem, not cases when "!50003" is hard-coded, for example. – Guy Fawkes Jan 14 '12 at 12:29

2 Answers2

3

Pure regular expressions can't handle nesting, but PHP's flavor can by using recursion. Using the PCRE_EXTENDED modifier so we can have whitespace and comments:

%(               # opening RE delimiter, group start
  /\*            # comment open marker
    (  [^/*]     # non comment-marker characters
     | /(?!\*)   # '/' not followed by '*', so not a comment open
     | \*(?!/)   # '*' not followed by '/', so not a comment close
     | (?R)      # recursive case
    )*           # repeat any number of times
  \*/            # comment close marker
)%x              # group end, closing RE delimiter, PCRE_EXTENDED

In short:

%(/\*([^/*]|/(?!\*)|\*(?!/)|(?R))*\*/)%x

In use:

<?php

$commentRE = '%(/\*([^/*]|/(?!\*)|\*(?!/)|(?1))*\*/)%';
$doc = <<<EOS

USE database;

/* comment
and a
/* nested comment /* me too */
   now exiting
 */
the comment */


/*!50003 text
some text else
/*
comment
also comment
*/
text...
and also text...
*/

CREATE TABLE IF NOT EXISTS ...

EOS;

preg_match_all($commentRE, $doc, $parts);
var_export($parts[0]);

Result:

array (
  0 => '/* comment
  and a
  /* nested comment /* me too */
     now exiting
   */
  the comment */',
  1 => '/*!50003 text
  some text else
  /*
    comment
    also comment
  */
  text...
  and also text...
*/',
)
outis
  • 75,655
  • 22
  • 151
  • 221
1

Based on this excellent solution I've done a PHP regexp to remove all type of comments (and only comments, not quoted text looking like comments ;): Regex to match MySQL comments

Community
  • 1
  • 1
Adrien Gibrat
  • 917
  • 10
  • 13