Regex to strip comments and multi-line comments and empty lines

Question

I want to parse a file and I want to use php and regex to strip:

blank or empty lines
single line comments
multi line comments

basically I want to remove any line containing

/* text */

or multi line comments

/***
some
text
*****/

If possible, another regex to check if the line is empty (Remove blank lines)

Is that possible? can somebody post to me a regex that does just that?

Thanks a lot.

Related: http://stackoverflow.com/questions/503871/best-way-to-automatically-remove-comments-from-php-code — user956584, Mar 20 '13 at 13:46

score 52 · Accepted Answer · answered Mar 13 '09 at 15:05

52

$text = preg_replace('!/\*.*?\*/!s', '', $text);
$text = preg_replace('/\n\s*\n/', "\n", $text);

answered Mar 13 '09 at 15:05

chaos

122,029
33
303
309

Thanks a lot! The first regex removed single line comments. However the second regex did no change and didn't remove multi line comments. I appreciate your response..thanks again – Ahmad Fouad Mar 13 '09 at 15:12
Make sure you have the !s on the first regex; it wasn't in my initial answer. That's what makes it handle multiline comments. The second pattern removes empty lines. – chaos Mar 13 '09 at 15:17
The !s makes it work 100%. It works much better than my regex, +1 from me. – St. John Johnson Mar 13 '09 at 15:29
1

Thanks! This worked for me. But my code also had common comments // Like so. I managed to also clear these with this regex ```$strData = preg_replace('/(?:(?:\/\*(?:[^*]|(?:\*+[^*\/]))*\*+\/)|(?:(?<!\:|\\\|\'|\")\/\/.*))/', '', $strData);```, that I got from this source: https://stackoverflow.com/a/31907095/2510785 – Jorge Mauricio Apr 17 '22 at 21:31

score 12 · Answer 2 · answered Mar 13 '09 at 15:11

Keep in mind that any regex you use will fail if the file you're parsing has a string containing something that matches these conditions. For example, it would turn this:

print "/* a comment */";

Into this:

print "";

Which is probably not what you want. But maybe it is, I don't know. Anyway, regexes technically can't parse data in a manner to avoid that problem. I say technically because modern PCRE regexes have tacked on a number of hacks to make them both capable of doing this and, more importantly, no longer regular expressions, but whatever. If you want to avoid stripping these things inside quotes or in other situations, there is no substitute for a full-blown parser (albeit it can still be pretty simple).

score 7 · Answer 3 · answered Oct 02 '13 at 12:15

7

//  Removes multi-line comments and does not create
//  a blank line, also treats white spaces/tabs 
$text = preg_replace('!^[ \t]*/\*.*?\*/[ \t]*[\r\n]!s', '', $text);

//  Removes single line '//' comments, treats blank characters
$text = preg_replace('![ \t]*//.*[ \t]*[\r\n]!', '', $text);

//  Strip blank lines
$text = preg_replace("/(^[\r\n]*|[\r\n]+)[\s\t]*[\r\n]+/", "\n", $text);

answered Oct 02 '13 at 12:15

makaveli_lcf

113
1
5

3

The single line comment replace doesn't work when there are URLs involved. `https://example.com` is also replaced. – ascx May 15 '17 at 14:49

score 3 · Answer 4 · answered Feb 03 '12 at 15:06

3

$string = preg_replace('#/\*[^*]*\*+([^/][^*]*\*+)*/#', '', $string);

answered Feb 03 '12 at 15:06

Federico Biccheddu

110
3
11

score 2 · Answer 5 · answered Mar 13 '09 at 15:12

2

It is possible, but I wouldn't do it. You need to parse the whole php file to make sure that you're not removing any necessary whitespace (strings, whitespace beween keywords/identifiers (publicfuntiondoStuff()), etc). Better use the tokenizer extension of PHP.

answered Mar 13 '09 at 15:12

soulmerge

73,842
19
118
155

I want to count on regex only. The file is too simple, it has couple of single line comments, multi line comment, and some PHP codes (each in a new line) .. i just want a regex formula that makes a clean-up...so i can use the output in the browser for different use. – Ahmad Fouad Mar 13 '09 at 15:18
Be aware that the regex-only approach will miss "here documents". To properly identify such text you really do need to use a tokenizer. – Peter Jan 28 '13 at 18:09

score 2 · Answer 6 · answered Mar 13 '09 at 15:28

2

This should work in replacing all /* to */.

$string = preg_replace('/(\s+)\/\*([^\/]*)\*\/(\s+)/s', "\n", $string);

answered Mar 13 '09 at 15:28

St. John Johnson

6,590
7
35
56

Appreciate your help as well. Thank you! – Ahmad Fouad Mar 14 '09 at 02:21

score 1 · Answer 7 · answered Oct 04 '12 at 18:59

This is a good function, and WORKS!

<?
if (!defined('T_ML_COMMENT')) {
   define('T_ML_COMMENT', T_COMMENT);
} else {
   define('T_DOC_COMMENT', T_ML_COMMENT);
}
function strip_comments($source) {
    $tokens = token_get_all($source);
    $ret = "";
    foreach ($tokens as $token) {
       if (is_string($token)) {
          $ret.= $token;
       } else {
          list($id, $text) = $token;

          switch ($id) { 
             case T_COMMENT: 
             case T_ML_COMMENT: // we've defined this
             case T_DOC_COMMENT: // and this
                break;

             default:
                $ret.= $text;
                break;
          }
       }
    }    
    return trim(str_replace(array('<?','?>'),array('',''),$ret));
}
?>

Now using this function 'strip_comments' for passing code contained in some variable:

<?
$code = "
<?php 
    /* this is comment */
   // this is also a comment
   # me too, am also comment
   echo "And I am some code...";
?>";

$code = strip_comments($code);

echo htmlspecialchars($code);
?>

Will result output as

<?
echo "And I am some code...";
?>

Loading from a php file:

<?
$code = file_get_contents("some_code_file.php");
$code = strip_comments($code);

echo htmlspecialchars($code);
?>

Loading a php file, stripping comments and saving it back

<?
$file = "some_code_file.php"
$code = file_get_contents($file);
$code = strip_comments($code);

$f = fopen($file,"w");
fwrite($f,$code);
fclose($f);
?>

Source: http://www.php.net/manual/en/tokenizer.examples.php

This works great. But there is one problem, it doest not remoive empty lines from where the comments are removed. If a file contains 500 lines of comments then the words are removed but the empty lines will still be there. Can you tell us the proper way of removing these empty lines. — asim-ishaq, May 02 '13 at 07:06
To result, apply next to remove empty lines: preg_replace('/\n\s*\n/', '', $code) or next to remove only empty lines of start: preg_replace('/^\n\s*\n/', '', $code) — Eduardo Cuomo, Jun 07 '13 at 15:02

score 0 · Answer 8 · answered May 31 '12 at 16:59

This is my solution , if one is not used to regexp. The following code remove all comment delimited by # and retrieves the values of variable in this style NAME=VALUE

  $reg = array();
  $handle = @fopen("/etc/chilli/config", "r");
  if ($handle) {
   while (($buffer = fgets($handle, 4096)) !== false) {
    $start = strpos($buffer,"#") ;
    $end   = strpos($buffer,"\n");
     // echo $start.",".$end;
       // echo $buffer ."<br>";



     if ($start !== false)

        $res = substr($buffer,0,$start);
    else
        $res = $buffer; 
        $a = explode("=",$res);

        if (count($a)>0)
        {
            if (count($a) == 1 && !empty($a[0]) && trim($a[0])!="")
                $reg[ $a[0] ] = "";
            else
            {
                if (!empty($a[0]) && trim($a[0])!="")
                    $reg[ $a[0] ] = $a[1];
            }
        }




    }

    if (!feof($handle)) {
        echo "Error: unexpected fgets() fail\n";
    }
    fclose($handle);
}

score 0 · Answer 9 · answered May 24 '19 at 08:39

I found this one to suit me better, (\s+)\/\*([^\/]*)\*/\n* it removes multi-line, tabbed or not comments and the spaced behind it. I'll leave a comment example which this regex would match.

/**
 * The AdditionalCategory
 * Meta informations extracted from the WSDL
 * - minOccurs : 0
 * - nillable : true
 * @var TestStructAdditionalCategorizationExternalIntegrationCUDListDataContract
 */

Regex to strip comments and multi-line comments and empty lines

9 Answers9

Linked

Related