preg_match issue to retrieve data from table

Question

 <h3 style="border-bottom: 3px solid #CCC;" class="margint15 marginb15">Headlines</h3> 
 <table cellpadding="0" cellspacing="0" border="0" class="nc" width="100%"> 
     <tr>
        <th class="left" colspan="2">Latest Headlines</th>
     </tr>
     <tr>
        <td class="left" width="620"> <a href="/blogs/rhb/79680.jsp" style="color:#06a;">Trading Stocks - 10
     July 2015 - Globetronics | A&M | Salcon | Comintel | Homeritz |
     MMSV</a> </td>
    </tr>
</table>

I want to extract the data from the tag "" which the class="nc" until the end of the tag "". How to write the pattern for preg_match?

want to extract the data from the tag "table" which the class="nc" until the end of the tag "table". How to write the pattern for preg_match? — Lim Neo, Dec 26 '15 at 08:46

score 1 · Accepted Answer · edited May 23 '17 at 11:45

1

Really, this has been discussed here like a thousand times, better not use some regular expression to grab html tags (there may be cases in which in works quite well though). For the sake of the christmas spirit, here's an example for your purpose (scraping financial data of a site that is not yours ;-)) Consider using an XML parser instead:

<?php
$str='<container>
<h3 style="border-bottom: 3px solid #CCC;" class="margint15
marginb15">Headlines</h3>  <table cellpadding="0" cellspacing="0"
border="0" class="nc" width="100%"> <tr><th class="left"
colspan="2">Latest Headlines</th></tr> <tr><td class="left" width="620"> <a
href="/blogs/rhb/79680.jsp" style="color:#06a;">Trading Stocks - 10
July 2015 - Globetronics | A&amp;M | Salcon | Comintel | Homeritz |
MMSV</a> </td></tr></table>
</container>';
$xml = simplexml_load_string($str);
print_r($xml);

// now you can loop over the table rows with
foreach ($xml->table->tr as $row) {
    // do whatever you want with it
    // child elements can be accessed likewise
}
?>

Hint: Obviously, I made up the container tag, it's likely to be html in your case.

Appendix: As Scuzzy points out, make yourself familiar with xpath (here's a good starting point), the combination is extremely powerful.

edited May 23 '17 at 11:45

Community

1
1

answered Dec 26 '15 at 09:11

Jan

42,290
8
54
79

1

And making use of [xpath](http://php.net/manual/en/simplexmlelement.xpath.php) is amazingly powerful to locate content. – Scuzzy Dec 26 '15 at 09:15
@Scuzzy: Added your comment to the answer, thanks for pointing it out! – Jan Dec 26 '15 at 09:18
@Scuzzy Though blatantly off-topic, how's the weather in good old Brizzy? – Jan Dec 26 '15 at 09:20
1

Thank you, it's a better solution which reminds me to improve my previous code! Anyway, regExp is really hard to understand. :) – Lim Neo Dec 26 '15 at 10:54
@Jan quite good at the moment, mix of 30c days and rain here and there. – Scuzzy Dec 26 '15 at 11:00
1

Also, here's some code that might help if you've got bad HTML to parse, returns a new simplexml object... `function simplexml_import_html($html){$dom = new DOMDocument('1.0','UTF-8');libxml_use_internal_errors(true);$dom->loadHTML(mb_convert_encoding($html,'HTML-ENTITIES','UTF-8'),LIBXML_HTML_NOIMPLIED|LIBXML_HTML_NODEFDTD);libxml_clear_errors();return simplexml_import_dom($dom);}` – Scuzzy Dec 26 '15 at 11:08
Thanks, Scuzzy!! I have a strong feeling that your function is really cool, but please forgive me if I failed to call your function..would you mind to give example? Thanks in advance! :) – Lim Neo Dec 27 '15 at 06:21

score 0 · Answer 2 · answered Dec 26 '15 at 09:49

You should go with this:

$str = '<h3 style="border-bottom: 3px solid #CCC;" class="margint15 marginb15">Headlines</h3><table cellpadding="0" cellspacing="0" border="0" class="nc" width="100%"> <tr><th class="left" colspan="2">Latest Headlines</th></tr> <tr><td class="left" width="620"> <a href="/blogs/rhb/79680.jsp" style="color:#06a;">Trading Stocks - 10 July 2015 - Globetronics | A&M | Salcon | Comintel | Homeritz | MMSV</a> </td></tr></table>';
preg_match_all('/<table.*?>(.*?)<\/table>/si', $str, $matches);

echo "<pre>";
print_r( strip_tags($matches[1][0]) );
die();

Thanks!

But I just need the table with the class "nc". Thanks! – Lim Neo Dec 26 '15 at 10:26 — Lim Neo, Dec 26 '15 at 10:26

preg_match issue to retrieve data from table

2 Answers2