1

I want to use grep to search a file for all lines containing <div, except that I do not want lines between <!-- and -->.

I have this regex to find the lines containing <div:   ^.*\<div.*$

I have this regex to exclude the lines between <!-- and -->:   ^((?!\<!--.*?--\>).)*$   — but it doesn't work. It only matches when the comment is in a single line. How can I fix that?

How can I combine these in one grep line? Or do I have to type two greps?

ruakh
  • 175,680
  • 26
  • 273
  • 307
MSK
  • 159
  • 1
  • 2
  • 7
  • I would use awk (http://www.gnu.org/software/gawk/). – fred02138 Oct 25 '13 at 18:04
  • Don't parse HTML with Regex, it won't work! http://stackoverflow.com/a/1732454/1682509 Use a XML/HTML parser instead, this will save you a lot of pain – Reeno Oct 25 '13 at 18:07

1 Answers1

1

grep does not support multiline searches like your search for <!-- ... -->. This can be worked around by using various helper commands, but in your case it's not worth it. It's better to just use a more powerful language, such as sed or AWK or Perl:

perl -ne '$on = 1 if m/<!--/; $on = "" if m/-->/; print if !$on and m/<div/' FILE

Edited to add: If you also want to discount instances of <!-- ... <div ... --> on a single line, you can write:

perl -ne ' my $line = $_;
           if ($in_comment && s/.*?-->//) {
               $in_comment = "";
           }
           while (!$in_comment && s/<!--.*?(-->)?/) {
               $in_comment = 1 if $1;
           }
           print $line if !$in_comment && m/<div/
         ' FILE
ruakh
  • 175,680
  • 26
  • 273
  • 307
  • thanks for your help. Can i use these regex in AWK. I have tried `perl -ne '$on = "" if m//; print if $on && `m/
    ` (two tab between `` how can i remove first one.
    – MSK Oct 25 '13 at 19:59
  • @MSK: Oh, I misunderstood what you wanted. I thought you were saying that you wanted to remove *lines* between ``; so, I didn't think your input could contain instances of `` all on one line. I'll update. – ruakh Oct 25 '13 at 20:25