Perl WWW::Mechanize::Firefox has successfully retrieved the contents of the web page, and stored in the scalar variable $content
.
my $url = 'http://finance.yahoo.com/quote/AAPL/financials?p=AAPL';
$mech->get($url);
my $content= $mech->content();
In examining $content
, I'm interested in identifying and saving all the information between the span tags
inside the table
.
There a varies classes that I have no interest in.
Attempt # 1 did not work.
my $tree = HTML::TreeBuilder->new_from_content($txtRawData);
my @list = $mech->find('span');
foreach ( @list ) {
print $_->as_HTML();
}
Attempt # 2 did not work.
foreach my $tag ($tree->look_down(_tag => 'span')) {
my $value = $tag->as_text;
}
The HTML table of interest is:
<div class="Mt(10px)">
<table class="Lh(1.7) W(100%) M(0)">
<tbody>
<tr class="Bdbw(1px) Bdbc($lightGray) Bdbs(s) H(36px)">
<td class="Fw(b) Fz(15px)">
<span>Revenue</span>
</td>
<td class="C($gray) Ta(end)">
<span>9/24/2016</span>
</td>
<td class="C($gray) Ta(end)">
<span>9/26/2015</span>
</td>
<td class="C($gray) Ta(end)">
<span>9/27/2014</span>
</td>
</tr>
<tr class="Bdbw(1px) Bdbc($lightGray) Bdbs(s) H(36px)">
<td class="Fz(s) H(35px) Va(m)">
<span>Total Revenue</span>
</td>
<td class="Fz(s) Ta(end)">
<span>
<span>215,639,000</span>
</span>
</td>
<td class="Fz(s) Ta(end)">
<span>
<span>233,715,000</span>
</span>
</td>
<td class="Fz(s) Ta(end)">
<span>
<span>182,795,000</span>
</span>
</td>
</tr>
<tr class="Bdbw(1px) Bdbc($lightGray) Bdbs(s) H(36px)">
<td class="Fz(s) H(35px) Va(m)">
<span>Cost of Revenue</span>
</td>
<td class="Fz(s) Ta(end)">
<span>
<span>131,376,000</span>
</span>
</td>
<td class="Fz(s) Ta(end)">
<span>
<span>140,089,000</span>
</span>
</td>
<td class="Fz(s) Ta(end)">
<span>
<span>112,258,000</span>
</span>
</td>
</tr>
<tr class="Bdbw(0px)! H(36px)">
<td class="Fw(b) Fz(s) Pb(20px)">
<span>Gross Profit</span>
</td>
<td class="Fw(b) Fz(s) Ta(end) Pb(20px)">
<span>
<span>84,263,000</span>
</span>
</td>
<td class="Fw(b) Fz(s) Ta(end) Pb(20px)">
<span>
<span>93,626,000</span>
</span>
</td>
<td class="Fw(b) Fz(s) Ta(end) Pb(20px)">
<span>
<span>70,537,000</span>
</span>
</td>
</tr>
</tbody>
</table>
</div>
What is the best way to select (set focus upon) one specific table
(their could be multiple tables inside the $content
variable), and save the text between the span tags
to an array (to be passed to the next procedure - to be inserted into a database table)?
I also would like to highlight that:
- Sometimes, the text is inside a two (double) sets of
span tags
. - There is no table header row (or
th
tags).