how to parse a HTML page with PHP5 and DOM?

Question

Possible Duplicate:
How to parse and process HTML with PHP?

i'm working on some piece of code that should get the contents of a very specific html-tag of an html-document given.

$html = "<html>..........truncated.........<div>blablabla<br />xy</div>.....";
$dom = new DomDocument();
$dom->loadHTML($html);

$divs = $dom->getElementsByTagName('div');

echo $divs->item(0)->nodeValue.'<br>';

the html-code is just an example but shows the very problem i'm experiencing: i want to get the content of this DIV and i NEED the inner tags to be kept! what nodeValue (as well as "textContent") does, is returning the contents of the correct node with all inner tags stripped (http://docs.php.net/manual/en/class.domnode.php)

i'm out of ideas how to get this right atm... what i need is the equivalent to javascripts "innerHTML" or so... but i cant find such a method :(

how do i get this right?

score 1 · Answer 1 · answered Sep 15 '10 at 22:08

This solution looks promising:

http://www.linked.com.mt/blog/code/php/php-domnode-tostring-xml/

$temp_doc = new DOMDocument('1.0', 'UTF-8');
$temp_node = $temp_doc->importNode($myDomNode, TRUE);
$temp_doc->appendChild($temp_node);
$my_node_as_string = $temp_doc->saveHTML();

score 0 · Answer 2 · answered Sep 15 '10 at 21:47

0

Have you seen phpQuery? Might be too much for what you're trying to accomplish but it's worth taking a look at.

answered Sep 15 '10 at 21:47

Marko

71,361
28
124
158

score 0 · Accepted Answer · answered Sep 16 '10 at 15:36

0

DOM is only good at parsing well-formed and 100% valid XML, so unless you're using 100% valid XHTML, it's going to fail.

What you want to use is the PHP Simple HTML DOM Parser library.

There are a great many tutorials on that site to help you w/ what you need.

answered Sep 16 '10 at 15:36

Theodore R. Smith

21,848
12
65
91

2

DOM can parse real world HTML fine when you load it with `DOMDocument::loadHTML` or `DOMDocument::loadHTMLFile`. This will utilize libxml's HTML Parser module then. – Gordon Feb 12 '12 at 11:09
In my experience, that's iffy at best. However, last time I tried was in 2009. Perhaps it's improved. – Theodore R. Smith Feb 21 '12 at 16:13

how to parse a HTML page with PHP5 and DOM?

3 Answers3