I have this web page with contains a lot of tables. I cannot change that page but need a way to work with the data on that page in a different application, so I need to be able to parse it and extract some data. I am terrible with regular expressions so would really appreciate some help on this. I will most likely use the regular expression in a PHP (Laravel) application if that's relevant to the syntax.
The web page I need to parse contains a lot of these (among other things):
<!-- Post number: 10000 -->
<!-- 127.0.0.1 127.0.0.1 -->
<table class="message" cellspacing="0" cellpadding="0" border="0">
<tr>
<td>
<table cellspacing="0" cellpadding="0" border="0">
<tr>
<td class="tableheader2" nowrap>
<B>Name: </B> Firstname Lastname
</td>
<td class="tableheader2" nowrap>
<a href="url.html?param=10000" target="_blank">
<img src="image.png" alt="Alt message" border="0">
</a>
<a href="url2.html?param2=20000">
<img src="image2.png" alt="Alt message" border="0">
</a>
</td>
<td class="tableheader2" width="100%">
</td>
</tr>
<tr>
<TD class="tableheader2" WIDTH=520 colspan="3">
<b>
Sent:
</b>
2014-01-01 11:00:00<BR>
</td>
</tr>
</table>
</td>
</tr>
<tr>
<td class="tableheader2">
<table class="tableheader2" CELLSPACING=0 CELLPADDING=0 BORDER=0>
<tr>
<td>
</td>
<td>
Lorem ipsum dolor sit amet, consectetur adipisicing elit. Quos, amet neque non voluptate facilis natus ullam impedit veritatis libero maiores.
</td>
<td>
</td>
</tr>
</table>
</td>
</tr>
</table>
<hr align="left">
That's just one of many such posts in a long row. I have also edited a bit (indents) for readability.
What I need is to be able to parse that entire page and grab all of these elements (I will be using their values from the example abow, but it could off course be anything):
- 10000 (from Post number comment)
- Firstname Lastname
- 2014-01-01 11:00:00
- Lorem ipsum dolor sit amet, consectetur adipisicing elit. Quos, amet neque non voluptate facilis natus ullam impedit veritatis libero maiores.
Any help with this would be very appreciated. I would have provided sample code, but none of my own futile attempts are even close so that would propably only be contra productive.