From the code in your post, it looks like you are trying to capture the bgcolor
attribute for each table cell in a given row. Not all of the cells have a bgcolor
set, but some of them do. Here's how you can extract that information using HTML::TreeBuilder:
use HTML::TreeBuilder 5 -weak;
my $html = q{<td nowrap align="right">DOLNOŚLĄSKIE</td><td nowrap align="right" bgcolor=#D0E0D0 >0</td><td nowrap align="right">0</td><td nowrap align="right" bgcolor=#D0E0D0 >0</td><td nowrap align="right">0</td><td nowrap align="right" bgcolor=#D0E0D0 >0</td><td nowrap align="right">0</td><td nowrap align="right" bgcolor=#D0E0D0 >0</td><td nowrap align="right">0</td><td nowrap align="right" bgcolor=#D0E0D0 >0</td><td nowrap align="right">4</td><td nowrap align="right" bgcolor=#D0E0D0 >0</td><td nowrap align="right">1</td><td nowrap align="right" bgcolor=#D0E0D0 >1</td><td nowrap align="right">3</td><td nowrap align="right" bgcolor=#D0E0D0 >6</td><td nowrap align="right">1</td><td nowrap align="right" bgcolor=#D0E0D0 >2</td><td nowrap align="right">1</td><td nowrap align="right" bgcolor=#D0E0D0 >19</td><td nowrap align="right">0</td></tr>};
my $t = HTML::TreeBuilder->new_from_content($html);
foreach my $col ( $t->look_down('_tag','tr')->content_list ) {
print $col->attr('bgcolor'), "\n" if defined $col->attr('bgcolor');
}
I'm sure you need to retrieve more than that, but it's all we are able to determine given the vague description and incomplete code of your question.
But the point is solid; don't parse HTML with regexes, parse HTML with an HTML parser. It's a slightly steeper learning curve at the beginning, but the result will be more robust, easier to maintain, and the skill you learn will be applicable to any HTML document, not just this particular one.
HTML::TreeBuilder comes with some good documentation, but you've got to read a good portion of it to make sense of the whole thing.
There's another HTML parsing module, Mojo::Dom, which comes with the Mojolicious framework. Personally, I find it easier to use, but sometimes when I post examples people seem to jump to the conclusion that they have to load some heavy-weight web framework to use it (which isn't entirely true, but I'm tired of swimming up-stream. ;). You might want to have a look at it and see if it better fits your taste. Here's an example:
use Mojo::DOM;
my $html = q{<td nowrap align="right">DOLNOŚLĄSKIE</td><td nowrap align="right" bgcolor=#D0E0D0 >0</td><td nowrap align="right">0</td><td nowrap align="right" bgcolor=#D0E0D0 >0</td><td nowrap align="right">0</td><td nowrap align="right" bgcolor=#D0E0D0 >0</td><td nowrap align="right">0</td><td nowrap align="right" bgcolor=#D0E0D0 >0</td><td nowrap align="right">0</td><td nowrap align="right" bgcolor=#D0E0D0 >0</td><td nowrap align="right">4</td><td nowrap align="right" bgcolor=#D0E0D0 >0</td><td nowrap align="right">1</td><td nowrap align="right" bgcolor=#D0E0D0 >1</td><td nowrap align="right">3</td><td nowrap align="right" bgcolor=#D0E0D0 >6</td><td nowrap align="right">1</td><td nowrap align="right" bgcolor=#D0E0D0 >2</td><td nowrap align="right">1</td><td nowrap align="right" bgcolor=#D0E0D0 >19</td><td nowrap align="right">0</td></tr>};
for my $td ( Mojo::DOM->new($html)->find('td[bgcolor]')->each ) {
print $td->attr('bgcolor'), "\n";
}
Both of those code examples will produce the following output:
#D0E0D0
#D0E0D0
#D0E0D0
#D0E0D0
#D0E0D0
#D0E0D0
#D0E0D0
#D0E0D0
#D0E0D0
#D0E0D0
...which probably isn't terribly useful, but is exactly what the code you posted seems to want to capture. At least it's a starting point that you should be able to adapt to your own needs.
I believe the documentation for Mojo::DOM is more approachable, which might just make the difference, especially if you're new to Perl. My recommendation would be to start there, and build your solution around that module. In the longrun you'll be much better off than tearing your hair out using regexes to extract data from HTML.
The Mojolicious distribution installs in under a minute on most systems, and includes the Mojo::DOM module, which on its own is quite light-weight. It's a good option.