1

Possible Duplicate:
RegEx match open tags except XHTML self-contained tags
How to parse and process HTML with PHP?

i need help. I have HTML, and i need regular expression which give me table. only one table. because after this table goes another tables. example html:

<table class="results" cellspacing="1" cellpadding="0" border="0" width="100%" align="left">
    <tr><td>text</td></tr>
</table>
<!-style>
tr.bg_selected{}
tr.bg_selected td, tr.bg_checked td { background-color:#ffe9bc !important;}
</style>**AND ANOTHER TABLE**

its my regular. there i get all tables after this table.

$regular = "/<table class=\"results\" cellspacing=\"(\d+)\" cellpadding=\"(\d+)\" border=\"(\d+)\" (.*)>(.*)<\/table>\n(.*)<\/style>/s";
    preg_match_all($regular,$str, $matches2, PREG_PATTERN_ORDER);
Community
  • 1
  • 1
Stars
  • 55
  • 6
  • Please, read this: http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454 – Edward Ruchevits Aug 16 '12 at 13:36
  • 2
    [The Pony, He Comes...](http://stackoverflow.com/a/1732454/1338999) *HTML is not a regular language and hence cannot be parsed by regular expressions.* – Matt Aug 16 '12 at 13:36
  • @OcuS can you repeat that comment in a more constructive manner? – Matt Aug 16 '12 at 13:39
  • @Matt: Yes, I can: don't use double-quotes in PHP ; it is getting evaluated by PHP to find if some variables are hidding in the string and theorically slows down the script execution. (But i'm still p*ssed off :) – OcuS Aug 16 '12 at 13:45
  • @OcuS - Don't use execution speed as the reason to use single quotes over double quotes. The speed difference is not significant. Having said that, the example in the question would have been a lot more readable if he had used single quotes to avoid having to escape the quotes in the string. And that's the real point here: code readability is important. Readability (and thus maintainability) should trump optimisation every time (well, almost every time, anyway). – SDC Aug 16 '12 at 14:10
  • @SDC: That's why I wrote "theorically slows down". – OcuS Aug 16 '12 at 15:27

1 Answers1

1

Some people have pointed out in the comments that you "can't parse HTML in regex". This isn't entirely accurate; it can be done.

However, it's difficult and error prone, and at the end of it you get a bit of a messy structure to work with.

I would therefore strongly recommend using PHP's built in HTML parser instead. It's very simple to use:

$doc = new DOMDocument();
$doc->loadHTML($htmlCode);

You can then work with the resulting object to extract the data you need.

$tables = $dom->getElementsByTagName('table');
foreach ($tables as $table){
    $cells = $table->getElementsByTagName('td');
    foreach ($cells as $cell){
        echo $cell->nodeValue;
    }
}

See the PHP manual for more info: http://php.net/manual/en/book.dom.php

SDC
  • 14,192
  • 2
  • 35
  • 48