1

The specific set of files in my case, are all included in a specified directory and its sub-directories. I need to make sure it does not delete conditional rules for IE browsers, like <!--[if IE 9]>. Here is a sample dataset:

<!DOCTYPE html>
<!-- Should not delete the conditional below -->
<!--[if IE 9]>
<html>
<![endif]-->
<head></head>
<body>
<p>Some content</p>
<!--Single line without space-->
<!-- Single line with spaces -->
<!-- Multi
     Line
     Comment -->

<div>Content</div>
</body>
</html>
hoijui
  • 3,615
  • 2
  • 33
  • 41
John
  • 1,243
  • 4
  • 15
  • 34
  • But conditional rules are HTML comments, so you're really trying to delete *some* HTML comments. – melpomene Jul 29 '15 at 00:06
  • maybe `` – maraca Jul 29 '15 at 00:17
  • if `.` doesn't stand for newlines then you can replace it by `(.|\s)` – maraca Jul 29 '15 at 00:23
  • 1
    Can we play a fun game of *"you post a regex which you think will parse HTML comments except conditionals, and we post ways it won't"*? @maraca's approach, I counter with `hello` - it's not a comment but will be removed as if it was. @Steven Penny's might get confused with `foo bar -->` This possibly isn't valid HTML, but Chrome makes `bar` visible. Also, removing comments might remove live JavaScript ( http://www.w3schools.com/tags/tag_comment.asp - Tips and Notes section). Use a HTML parser. – TessellatingHeckler Jul 29 '15 at 02:24
  • @TessellatingHeckler obviously it doesn't work for stuff like that, I know Zalgo the pony or whatever, but sometimes you can do it supervised with an unsave regex (e.g. using an advanced text editor) instead of bothering with parsing. – maraca Jul 29 '15 at 02:35
  • 1
    http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454 – maraca Jul 29 '15 at 02:45

1 Answers1

0

This will not handle more complex cases, but could get you started:

#!/usr/bin/awk -f
/<!--/ {comm = 1}
comm == 0
/-->/ {comm = 0}
Zombo
  • 1
  • 62
  • 391
  • 407