2

Using this regex expression:

preg_replace( '/<!--(?!<!)[^\[>].*?-->/', '', $output )

I'm able to remove all HTML comments from my page except for anything that looks like this:

<!--[if IE 6]>
    Special instructions for IE 6 here
<![endif]-->

How can I modify this to also exclude HTML comments which include a unique phrase, such as "batcache"?

So, an HTML comment this:

<!--
generated 37 seconds ago
generated in 0.978 seconds
served from batcache in 0.004 seconds
expires in 263 seconds
-->

Won't be removed.


This code seems to do the trick:

preg_replace( '/<!--([\s\S]*?)-->/', function( $c ) { return ( strpos( $c[1], '<![' ) !== false || strpos( $c[1], 'batcache' ) !== false ) ? $c[0] : ''; }, $output )
Rich
  • 1,136
  • 3
  • 16
  • 36
  • why don't you use `strip_tags`? and add back the special conditional comments? – Daniel A. White Feb 11 '15 at 19:55
  • 2
    **Don't use regular expressions to parse HTML. Use a proper HTML parsing module.** You cannot reliably parse HTML with regular expressions, and you will face sorrow and frustration down the road. As soon as the HTML changes from your expectations, your code will be broken. See http://htmlparsing.com/php or [this SO thread](http://stackoverflow.com/questions/3577641/how-do-you-parse-and-process-html-xml-in-php) for examples of how to properly parse HTML with PHP modules that have already been written, tested and debugged. – Andy Lester Feb 11 '15 at 19:56

1 Answers1

2

This should replace alle the comments which doesn't contain "batcache". The matching is done between this two tags: <!-- to --> .

$result = preg_replace("/<!--((?!batcache)(?!\\[endif\\])[\\s\\S])*?-->/", "", $str);

You can test it here.

As already stated by other users it's not always safe to parse HTML with regex but if you have a relative assurance of what kind of HTML you will parse it should work as expected. If the regex doesn't match some particular usecase let me know.

ntrp
  • 402
  • 5
  • 12
  • Thanks man that's nearly exactly what I was looking for, but what happened to the conditional comment exceptions? I updated my question to show the code I got working. Also, I totally understand what @AndyLester was saying about regex parsing, but in this case—with a unique, unchanging condition—I would think it's OK. – Rich Feb 11 '15 at 21:57
  • I'm sorry, I misread the question. I thought you wanted to replace all the tags except for the ones containing batcache. I have modified the answer accordingly. In case you need more matches to exclude I think you can add another negative lookahead to the list in the format "(?!string)". – ntrp Feb 11 '15 at 22:11
  • Maybe `[endif]` it's not perfect, you can replace it with `<![` as in your solution if you prefer. – ntrp Feb 11 '15 at 22:16