57

What’s the best way to remove comments from a PHP file?

I want to do something similar to strip-whitespace() - but it shouldn't remove the line breaks as well.

For example,

I want this:

<?PHP
// something
if ($whatsit) {
    do_something(); # we do something here
    echo '<html>Some embedded HTML</html>';
}
/* another long
comment
*/
some_more_code();
?>

to become:

<?PHP
if ($whatsit) {
    do_something();
    echo '<html>Some embedded HTML</html>';
}
some_more_code();
?>

(Although if the empty lines remain where comments are removed, that wouldn't be OK.)

It may not be possible, because of the requirement to preserve embedded HTML - that’s what’s tripped up the things that have come up on Google.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
benlumley
  • 11,370
  • 2
  • 40
  • 39
  • Look into obfusacators. Although you'd have to find one that was configurable--to strip comments only. – Michael Haren Feb 02 '09 at 16:52
  • Someone is bound to ask why: The code needs to go to a clients server to be deployed, so we want to make sure no is there that shouldn't be. – benlumley Feb 02 '09 at 16:52
  • Are you talking about inappropriate content in the comments? Or is this just for size - smaller PHP scripts make almost no performance difference except in high usage or unusual cases (and Zend is usually a better answer than stripping them). – Adam Davis Feb 02 '09 at 17:02
  • its where there are things in the comments that we don't want to risk being read. they shouldn't be there - but too late for that now. – benlumley Feb 02 '09 at 17:04
  • I'd be reluctant to remove comments unless you're doing obfuscation. You may find a time when you need those comments on the client's server. Also, have you made it clear to them that the code is coming with comments? They may not like the surprise when they bring in different consultants... – Adam Davis Feb 02 '09 at 17:04
  • @benlumley will you look into this question for a moment http://stackoverflow.com/questions/14040560/brandonaaron-jquery-mousewheel-fix-maximum-value – Vivek Dragon Dec 26 '12 at 12:39

15 Answers15

70

I'd use tokenizer. Here's my solution. It should work on both PHP 4 and 5:

$fileStr = file_get_contents('path/to/file');
$newStr  = '';

$commentTokens = array(T_COMMENT);
    
if (defined('T_DOC_COMMENT')) {
    $commentTokens[] = T_DOC_COMMENT; // PHP 5
}

if (defined('T_ML_COMMENT')) {
    $commentTokens[] = T_ML_COMMENT;  // PHP 4
}

$tokens = token_get_all($fileStr);

foreach ($tokens as $token) {    
    if (is_array($token)) {
        if (in_array($token[0], $commentTokens)) {
            continue;
        }
        
        $token = $token[1];
    }

    $newStr .= $token;
}

echo $newStr;
Tomas Votruba
  • 23,240
  • 9
  • 79
  • 115
Ionuț G. Stan
  • 176,118
  • 18
  • 189
  • 202
  • 3
    You should take out `$commentTokens` initialization out of the `foreach` block, otherwise +1 and thanks :) – raveren Oct 10 '10 at 19:17
  • @Raveren, you're damn right. I have no idea what was in my mind back then to put that piece of code inside the loop. Thanks for pointing it out. – Ionuț G. Stan Oct 11 '10 at 07:39
  • @IonuțG.Stan I have been trying to implement this, but it's breaking a lot of code. Here's an example: ``` ### Version ### const MARKDOWNLIB_VERSION = "1.6.0"; ### Simple Function Interface ### public static function defaultTransform($text) { ``` Becomes ``` ### Version # const MARKDOWNLIB_VERSION = "1.6.0"; ### Simple Function Interface # public static function defaultTransform($text) { ``` Not sure if this will format well here... – Andrew Christensen Oct 28 '16 at 01:33
  • @AndrewChristensen I can't reproduce it. What PHP version are you using? – Ionuț G. Stan Oct 29 '16 at 14:23
  • How do we use this code snippent ? Create a file cleanup.php and put this code block in ? Say we clean up a file called index.php, we get the output of $newStr and paste back into index.php ? Is that how this works ? – MarcoZen May 28 '17 at 19:50
48

Use php -w <sourcefile> to generate a file stripped of comments and whitespace, and then use a beautifier like PHP_Beautifier to reformat for readability.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Paul Dixon
  • 295,876
  • 54
  • 310
  • 348
10
$fileStr = file_get_contents('file.php');
foreach (token_get_all($fileStr) as $token ) {
    if ($token[0] != T_COMMENT) {
        continue;
    }
    $fileStr = str_replace($token[1], '', $fileStr);
}

echo $fileStr;
Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Tom Haigh
  • 57,217
  • 21
  • 114
  • 142
9

Here's the function posted above, modified to recursively remove all comments from all PHP files within a directory and all its subdirectories:

function rmcomments($id) {
    if (file_exists($id)) {
        if (is_dir($id)) {
            $handle = opendir($id);
            while($file = readdir($handle)) {
                if (($file != ".") && ($file != "..")) {
                    rmcomments($id . "/" . $file); }}
            closedir($handle); }
        else if ((is_file($id)) && (end(explode('.', $id)) == "php")) {
            if (!is_writable($id)) { chmod($id, 0777); }
            if (is_writable($id)) {
                $fileStr = file_get_contents($id);
                $newStr  = '';
                $commentTokens = array(T_COMMENT);
                if (defined('T_DOC_COMMENT')) { $commentTokens[] = T_DOC_COMMENT; }
                if (defined('T_ML_COMMENT')) { $commentTokens[] = T_ML_COMMENT; }
                $tokens = token_get_all($fileStr);
                foreach ($tokens as $token) {
                    if (is_array($token)) {
                        if (in_array($token[0], $commentTokens)) { continue; }
                        $token = $token[1]; }
                    $newStr .= $token; }
                if (!file_put_contents($id, $newStr)) {
                    $open = fopen($id, "w");
                    fwrite($open, $newStr);
                    fclose($open);
                }
            }
        }
    }
}

rmcomments("path/to/directory");
Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
John Tyler
  • 99
  • 1
  • 2
4

A more powerful version: remove all comments in the folder

<?php
    $di = new RecursiveDirectoryIterator(__DIR__, RecursiveDirectoryIterator::SKIP_DOTS);
    $it = new RecursiveIteratorIterator($di);
    $fileArr = [];
    foreach($it as $file) {
        if(pathinfo($file, PATHINFO_EXTENSION) == "php") {
            ob_start();
            echo $file;
            $file = ob_get_clean();
            $fileArr[] = $file;
        }
    }
    $arr = [T_COMMENT, T_DOC_COMMENT];
    $count = count($fileArr);
    for($i=1; $i < $count; $i++) {
        $fileStr = file_get_contents($fileArr[$i]);
        foreach(token_get_all($fileStr) as $token) {
            if(in_array($token[0], $arr)) {
                $fileStr = str_replace($token[1], '', $fileStr);
            }
        }
        file_put_contents($fileArr[$i], $fileStr);
    }
Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
3

Following upon the accepted answer, I needed to preserve the line numbers of the file too, so here is a variation of the accepted answer:

    /**
     * Removes the php comments from the given valid php string, and returns the result.
     *
     * Note: a valid php string must start with <?php.
     *
     * If the preserveWhiteSpace option is true, it will replace the comments with some whitespaces, so that
     * the line numbers are preserved.
     *
     *
     * @param string $str
     * @param bool $preserveWhiteSpace
     * @return string
     */
    function removePhpComments(string $str, bool $preserveWhiteSpace = true): string
    {
        $commentTokens = [
            \T_COMMENT,
            \T_DOC_COMMENT,
        ];
        $tokens = token_get_all($str);


        if (true === $preserveWhiteSpace) {
            $lines = explode(PHP_EOL, $str);
        }


        $s = '';
        foreach ($tokens as $token) {
            if (is_array($token)) {
                if (in_array($token[0], $commentTokens)) {
                    if (true === $preserveWhiteSpace) {
                        $comment = $token[1];
                        $lineNb = $token[2];
                        $firstLine = $lines[$lineNb - 1];
                        $p = explode(PHP_EOL, $comment);
                        $nbLineComments = count($p);
                        if ($nbLineComments < 1) {
                            $nbLineComments = 1;
                        }
                        $firstCommentLine = array_shift($p);

                        $isStandAlone = (trim($firstLine) === trim($firstCommentLine));

                        if (false === $isStandAlone) {
                            if (2 === $nbLineComments) {
                                $s .= PHP_EOL;
                            }

                            continue; // Just remove inline comments
                        }

                        // Stand-alone case
                        $s .= str_repeat(PHP_EOL, $nbLineComments - 1);
                    }
                    continue;
                }
                $token = $token[1];
            }

            $s .= $token;
        }
        return $s;
    }

Note: this is for PHP 7+ (I didn't care about backward compatibility with older PHP versions).

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
ling
  • 9,545
  • 4
  • 52
  • 49
2
/*
* T_ML_COMMENT does not exist in PHP 5.
* The following three lines define it in order to
* preserve backwards compatibility.
*
* The next two lines define the PHP 5 only T_DOC_COMMENT,
* which we will mask as T_ML_COMMENT for PHP 4.
*/

if (! defined('T_ML_COMMENT')) {
    define('T_ML_COMMENT', T_COMMENT);
} else {
    define('T_DOC_COMMENT', T_ML_COMMENT);
}

/*
 * Remove all comment in $file
 */

function remove_comment($file) {
    $comment_token = array(T_COMMENT, T_ML_COMMENT, T_DOC_COMMENT);

    $input = file_get_contents($file);
    $tokens = token_get_all($input);
    $output = '';

    foreach ($tokens as $token) {
        if (is_string($token)) {
            $output .= $token;
        } else {
            list($id, $text) = $token;

            if (in_array($id, $comment_token)) {
                $output .= $text;
            }
        }
    }

    file_put_contents($file, $output);
}

/*
 * Glob recursive
 * @return ['dir/filename', ...]
 */

function glob_recursive($pattern, $flags = 0) {
    $file_list = glob($pattern, $flags);

    $sub_dir = glob(dirname($pattern) . '/*', GLOB_ONLYDIR);
    // If sub directory exist
    if (count($sub_dir) > 0) {
        $file_list = array_merge(
            glob_recursive(dirname($pattern) . '/*/' . basename($pattern), $flags),
            $file_list
        );
    }

    return $file_list;
}

// Remove all comment of '*.php', include sub directory
foreach (glob_recursive('*.php') as $file) {
    remove_comment($file);
}
Steely Wing
  • 16,239
  • 8
  • 58
  • 54
2

If you already use an editor like UltraEdit, you can open one or multiple PHP file(s) and then use a simple Find&Replace (Ctrl + R) with the following Perl regular expression:

(?s)/\*.*\*/

Beware the above regular expression also removes comments inside a string, i.e., in echo "hello/*babe*/"; the /*babe*/ would be removed too. Hence, it could be a solution if you have few files to remove comments from. In order to be absolutely sure it does not wrongly replace something that is not a comment, you would have to run the Find&Replace command and approve each time what is getting replaced.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Marco Demaio
  • 33,578
  • 33
  • 128
  • 159
2

Bash solution: If you want to remove recursively comments from all PHP files starting from the current directory, you can write this one-liner in the terminal. (It uses temp1 file to store PHP content for processing.)

Note that this will strip all white spaces with comments.

 find . -type f -name '*.php' | while read VAR; do php -wq $VAR > temp1  ;  cat temp1 > $VAR; done

Then you should remove temp1 file after.

If PHP_BEAUTIFER is installed then you can get nicely formatted code without comments with

 find . -type f -name '*.php' | while read VAR; do php -wq $VAR > temp1; php_beautifier temp1 > temp2;  cat temp2 > $VAR; done;

Then remove two files (temp1 and temp2).

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Pawel Dubiel
  • 18,665
  • 3
  • 40
  • 58
1

For Ajax and JSON responses, I use the following PHP code, to remove comments from HTML/JavaScript code, so it would be smaller (about 15% gain for my code).

// Replace doubled spaces with single ones (ignored in HTML any way)
$html = preg_replace('@(\s){2,}@', '\1', $html);
// Remove single and multiline comments, tabs and newline chars
$html = preg_replace(
    '@(/\*([^*]|[\r\n]|(\*+([^*/]|[\r\n])))*\*+/)|((?<!:)//.*)|[\t\r\n]@i',
    '',
    $html
);

It is short and effective, but it can produce unexpected results, if your code has bad syntax.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Deele
  • 3,728
  • 2
  • 33
  • 51
  • Doesn't this regexp remove comments in strings? I.e. in `echo "hello /*baby*/ boy";` wouldn't your regexp mistakenly remove `/*baby*/` in sting? – Marco Demaio Feb 25 '14 at 13:13
  • @MarcoDemaio It will. To avoid that, you will need parser, not some simple Regex, because you need to follow quotation states and know where comment resides and where you don't need them. JSON is not meant for complex data structures and you should avoid situations, where you have possibility that there will be some single or multi-line comments inside data. – Deele Feb 25 '14 at 16:43
1

Run the command php --strip file.php in a command prompt (for example., cmd.exe), and then browse to WriteCodeOnline.

Here, file.php is your own file.

1

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
1

In 2019 it could work like this:

<?php
/*   hi there !!!
here are the comments */
//another try

echo removecomments('index.php');

/*   hi there !!!
here are the comments */
//another try
function removecomments($f){
    $w=Array(';','{','}');
    $ts = token_get_all(php_strip_whitespace($f));
    $s='';
    foreach($ts as $t){
        if(is_array($t)){
            $s .=$t[1];
        }else{
            $s .=$t;
            if( in_array($t,$w) ) $s.=chr(13).chr(10);
        }
    }

    return $s;
}

?>

If you want to see the results, just let's run it first in XAMPP, and then you get a blank page, but if you right click and click on view source, you get your PHP script ... it's loading itself and it's removing all comments and also tabs.

I prefer this solution too, because I use it to speed up my framework one file engine "m.php" and after php_strip_whitespace, all source without this script I observe is slowest: I did 10 benchmarks, and then I calculate the math average (I think PHP 7 is restoring back the missing cr_lf's when it is parsing or it is taking a while when these are missing).

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
0

The catch is that a less robust matching algorithm (simple regex, for instance) will start stripping here when it clearly shouldn't:

if (preg_match('#^/*' . $this->index . '#', $this->permalink_structure)) {  

It might not affect your code, but eventually someone will get bit by your script. So you will have to use a utility that understands more of the language than you might otherwise expect.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Adam Davis
  • 91,931
  • 60
  • 264
  • 330
0

php -w or php_strip_whitespace($filename);

documentation

Gam Sengie
  • 29
  • 4
0

for me this work correctly in LINUX and WINDOWS:

$php_0com = file_get_contents('file.php');
$php_0com = preg_replace('@/\*.*?\*/|\n\r@s', '', $php_0com);
$php_0com = trim(preg_replace('@(^<\?|//|#).*\r\n@', '', $php_0com));
print_r($php_0com);

note the difference in the TWO lines of code:

@regex@s vs @regex@

the modifier s allow search multi-line: https://www.php.net/manual/en/reference.pcre.pattern.modifiers.php

Yamile
  • 39
  • 7