Regexp in php, take html

Question

Possible Duplicate:
RegEx match open tags except XHTML self-contained tags
How to parse and process HTML with PHP?

i need help. I have HTML, and i need regular expression which give me table. only one table. because after this table goes another tables. example html:

<table class="results" cellspacing="1" cellpadding="0" border="0" width="100%" align="left">
    <tr><td>text</td></tr>
</table>
<!-style>
tr.bg_selected{}
tr.bg_selected td, tr.bg_checked td { background-color:#ffe9bc !important;}
</style>**AND ANOTHER TABLE**

its my regular. there i get all tables after this table.

$regular = "/<table class=\"results\" cellspacing=\"(\d+)\" cellpadding=\"(\d+)\" border=\"(\d+)\" (.*)>(.*)<\/table>\n(.*)<\/style>/s";
    preg_match_all($regular,$str, $matches2, PREG_PATTERN_ORDER);

Please, read this: http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454 — Edward Ruchevits, Aug 16 '12 at 13:36
[The Pony, He Comes...](http://stackoverflow.com/a/1732454/1338999) *HTML is not a regular language and hence cannot be parsed by regular expressions.* — Matt, Aug 16 '12 at 13:36
@OcuS can you repeat that comment in a more constructive manner? — Matt, Aug 16 '12 at 13:39
@Matt: Yes, I can: don't use double-quotes in PHP ; it is getting evaluated by PHP to find if some variables are hidding in the string and theorically slows down the script execution. (But i'm still p*ssed off :) — OcuS, Aug 16 '12 at 13:45
@OcuS - Don't use execution speed as the reason to use single quotes over double quotes. The speed difference is not significant. Having said that, the example in the question would have been a lot more readable if he had used single quotes to avoid having to escape the quotes in the string. And that's the real point here: code readability is important. Readability (and thus maintainability) should trump optimisation every time (well, almost every time, anyway). — SDC, Aug 16 '12 at 14:10

SDC · Answer 1 · 2012-08-16T14:24:48.070

1

Some people have pointed out in the comments that you "can't parse HTML in regex". This isn't entirely accurate; it can be done.

However, it's difficult and error prone, and at the end of it you get a bit of a messy structure to work with.

I would therefore strongly recommend using PHP's built in HTML parser instead. It's very simple to use:

$doc = new DOMDocument();
$doc->loadHTML($htmlCode);

You can then work with the resulting object to extract the data you need.

$tables = $dom->getElementsByTagName('table');
foreach ($tables as $table){
    $cells = $table->getElementsByTagName('td');
    foreach ($cells as $cell){
        echo $cell->nodeValue;
    }
}

See the PHP manual for more info: http://php.net/manual/en/book.dom.php

edited Aug 16 '12 at 14:24

answered Aug 16 '12 at 14:18

SDC

14,192
2
35
48

Always use DOM ! Perfect answer. – Ionut Flavius Pogacian Aug 16 '12 at 14:26
@IonutFlaviusPogacian - well, maybe not *always*, but certainly in anything but the most trivial case. – SDC Aug 16 '12 at 14:33

Regexp in php, take html

1 Answers1