I need regex that will match everything not in <div>
tag. For example:
foobar<p>lol</p><div>something</div>blahblah
Should match foobar<p>lol</p>
and blahblah
I need regex that will match everything not in <div>
tag. For example:
foobar<p>lol</p><div>something</div>blahblah
Should match foobar<p>lol</p>
and blahblah
As Mat and maenu pointed out already, using regexps to parse HTML is –to say the least– error prone. Since you tagged your question with the perl tag, I'll give you a small example using HTML::TokeParser::Simple
, which I think is a good choice for these kinds of manipulation.
use strict;
use warnings;
use HTML::TokeParser::Simple;
my $parser = HTML::TokeParser::Simple->new( *DATA );
my $is_in_div;
while ( my $token = $parser->get_token ) {
if ( $token->is_start_tag( 'div' ) ) {
$is_in_div++;
next;
}
if ( $token->is_end_tag( 'div' ) ) {
$is_in_div--;
next;
}
print $token->as_is if not $is_in_div;
}
__DATA__
foobar<p>lol</p><div>something</div>blahblah
foobar<p>lol</p><div>more stuff<div>something</div></div>blahblah
Not sure what you're trying to accomplish, and a big caveat that this won't work on all HTML (see here), but the following might do the trick:
#!/opt/perl/bin/perl
use strict;
use warnings;
use 5.010;
my $html = 'foobar<p>lol</p><div>something</div>blahblah';
my @fragments = split(m{<div\b[^>]*>.*?</div>}is, $html);
say foreach @fragments;
see perldoc -f split
and perldoc perlre
for more info.