Multiple Condition in grep regex

Question

I want to use grep to search a file for all lines containing <div, except that I do not want lines between .

I have this regex to find the lines containing <div: ^.*\<div.*$

I have this regex to exclude the lines between : ^((?!\<!--.*?--\>).)*$ — but it doesn't work. It only matches when the comment is in a single line. How can I fix that?

How can I combine these in one grep line? Or do I have to type two greps?

Don't parse HTML with Regex, it won't work! http://stackoverflow.com/a/1732454/1682509 Use a XML/HTML parser instead, this will save you a lot of pain — Reeno, Oct 25 '13 at 18:07

ruakh · Accepted Answer · 2013-10-25T20:31:05.533

1

grep does not support multiline searches like your search for . This can be worked around by using various helper commands, but in your case it's not worth it. It's better to just use a more powerful language, such as sed or AWK or Perl:

perl -ne '$on = 1 if m/<!--/; $on = "" if m/-->/; print if !$on and m/<div/' FILE

Edited to add: If you also want to discount instances of  on a single line, you can write:

perl -ne ' my $line = $_;
           if ($in_comment && s/.*?-->//) {
               $in_comment = "";
           }
           while (!$in_comment && s/<!--.*?(-->)?/) {
               $in_comment = 1 if $1;
           }
           print $line if !$in_comment && m/<div/
         ' FILE

edited Oct 25 '13 at 20:31

answered Oct 25 '13 at 18:05

ruakh

175,680
26
273
307

thanks for your help. Can i use these regex in AWK. I have tried `perl -ne '$on = "" if m//; print if $on && `m/
` (two tab between `` how can i remove first one.
– MSK Oct 25 '13 at 19:59
@MSK: Oh, I misunderstood what you wanted. I thought you were saying that you wanted to remove *lines* between ``; so, I didn't think your input could contain instances of `` all on one line. I'll update. – ruakh Oct 25 '13 at 20:25

Multiple Condition in grep regex

1 Answers1