How to find all strings between <
and >
but excluding some special tags like b,i,ul,ol,li,p.
Is there a shorter solution to the following?
while ($html =~ /<(\w+)>/g) {
print "found $1\n" if $1 ne 'b' && $1 ne 'ul' && $1 ne 'p' ...
}
How to find all strings between <
and >
but excluding some special tags like b,i,ul,ol,li,p.
Is there a shorter solution to the following?
while ($html =~ /<(\w+)>/g) {
print "found $1\n" if $1 ne 'b' && $1 ne 'ul' && $1 ne 'p' ...
}
Can use a library, and Mojo::DOM makes it easy
use Mojo::DOM;
my $dom = Mojo::DOM->new($html);
for ( $dom->find(':not(b,i,ul,ol,li,p)')->each ) {
say
}
Now you also have the HTML parsed and can process it as needed at will.
You can use
while ($html =~ /<(?!(?:b|ul|p)>)(\w+)>/g) {
print "found $1\n"
}
See the regex demo. Details:
<
- a <
char(?!(?:b|ul|p)>)
- a negative lookahead that fails the match if, immediately to the right of the current location, there is b
, ul
or p
followed with a >
char(\w+)
- Capturing group 1: one or more word chars>
- a >
char.