I want to extract info from an website link:
http://www.website.com
There is a string that appears few times: "STRING TO CAPTURE", but I want to capture the FIRST time appears. It will be inside the following structure:
<td width="10%" bgcolor="#FFFFFF"><font class="bodytext9">1-Jun-2013</font></td>
<td width="4%" bgcolor="#FFFFFF" align=center><font class="bodytext9">Sat</font></td>
<td width="4%" bgcolor="#FFFFFF" align="center"><font class="bodytext9">TIME</font></td>
<td width="15%" bgcolor="#FFFFFF" align="center"><a class="black_9" href="link1">Some Text here</a></td>
<td width="5%" bgcolor="#FFFFFF" align="center"><font class="bodytext9"><img src="img/colors/pink.gif"></font></td>
<td width="5%" bgcolor="#FFFFFF" align="center"></td>
<td width="5%" bgcolor="#FFFFFF" align="center"><font class="bodytext9">Another Text</font></td>
<td width="5%" bgcolor="#FFFFFF" align="center"></td>
<td width="5%" bgcolor="#FFFFFF" align="center"><font class="bodytext9"><img src="img/colors/white.gif"></font></td>
<td width="15%" bgcolor="#FFFFFF" align="center"><a class="black_9" href="link2">Here is also Text</a></td>
<td width="15%" bgcolor="#FFFFFF" align="center"><a href="LINKtoWeb" class=list><u>STRING TO CAPTURE</u></a></td>
<td width="4%" bgcolor="#FFFFFF" align="center"><a target="_new" href="AnotherLink"><img src="img/img2.gif" border="0"></a></td>
</tr>
This is a fix format, where between the is 12 lines start with and all other tags; I want to extract the text in each line, eg.
1-Jun-2013
Sat
TIME
Some Text here
...
STRING TO CAPTURE
and I also want to extract the link at line contain "STRING TO CAPTURE" which is:
LINKtoWeb
In my opinion, python could be very functional to do this task, but I also too new to python to get it works, hope python experts here can show me how. I have no idea where to start, search around and find this could be solution:
use YAML;
my $data = Load(http://www.website.com);
say $data->{"<tr>"}->{"<td>"}->{"STRING TO CAPTURE"};
But I don't know how to deal with all the texts in these 12 lines ?