This is trivial to do using a proper HTML parser. This program demonstrates using HTML::TreeBuilder
and the look_down
method.
It is searching for all elements with:
- A tag name of 'img'
- A
src
attribute that matches the regex qr|^/file\?id=|
- A
class
attribute that matches the null regex (i.e. a class attribute with any value)
- An
alt
attribute that matches the null regex
You don't say what you want to do with the elements once you've found them. This code just uses as_HTML
to display them.
use strict;
use warnings;
use HTML::TreeBuilder;
my $html = HTML::TreeBuilder::XPath->new_from_file(\*DATA);
my @images = $html->look_down(
_tag => 'img',
src => qr|^/file\?id=|,
class => qr//,
alt => qr//
);
print $_->as_HTML, "\n" for @images;
__DATA__
<html>
<head>
<title>Page title</title>
</head.
<body>
<img rel="lightbox[45451]" src="/file?id=13166" class="bbc_img" alt="myimagess.jpg">
<img src="/file?id=13166" class="bbc_img" alt="myimagess.jpg">
<img src="/file" class="bbc_img" alt="myimagess.jpg"> /* mismatch id="" */
<img src="/file?id=13166" alt="myimagess.jpg"> /* no class="" */
<img src="/file?id=13166" class="bbc_img"> /* no alt="" */
</body>
</html>
output
<img alt="myimagess.jpg" class="bbc_img" rel="lightbox[45451]" src="/file?id=13166" />
<img alt="myimagess.jpg" class="bbc_img" src="/file?id=13166" />