0

Possible Duplicate:
How to parse and process HTML with PHP?

i'm working on some piece of code that should get the contents of a very specific html-tag of an html-document given.

$html = "<html>..........truncated.........<div>blablabla<br />xy</div>.....";
$dom = new DomDocument();
$dom->loadHTML($html);

$divs = $dom->getElementsByTagName('div');

echo $divs->item(0)->nodeValue.'<br>';

the html-code is just an example but shows the very problem i'm experiencing: i want to get the content of this DIV and i NEED the inner tags to be kept! what nodeValue (as well as "textContent") does, is returning the contents of the correct node with all inner tags stripped (http://docs.php.net/manual/en/class.domnode.php)

i'm out of ideas how to get this right atm... what i need is the equivalent to javascripts "innerHTML" or so... but i cant find such a method :(

how do i get this right?

Community
  • 1
  • 1
xenonite
  • 1,671
  • 4
  • 28
  • 43

3 Answers3

1

This solution looks promising:

http://www.linked.com.mt/blog/code/php/php-domnode-tostring-xml/

$temp_doc = new DOMDocument('1.0', 'UTF-8');
$temp_node = $temp_doc->importNode($myDomNode, TRUE);
$temp_doc->appendChild($temp_node);
$my_node_as_string = $temp_doc->saveHTML();
Andrew67
  • 357
  • 2
  • 7
0

Have you seen phpQuery? Might be too much for what you're trying to accomplish but it's worth taking a look at.

Marko
  • 71,361
  • 28
  • 124
  • 158
0

DOM is only good at parsing well-formed and 100% valid XML, so unless you're using 100% valid XHTML, it's going to fail.

What you want to use is the PHP Simple HTML DOM Parser library.

There are a great many tutorials on that site to help you w/ what you need.

Theodore R. Smith
  • 21,848
  • 12
  • 65
  • 91
  • 2
    DOM can parse real world HTML fine when you load it with `DOMDocument::loadHTML` or `DOMDocument::loadHTMLFile`. This will utilize libxml's HTML Parser module then. – Gordon Feb 12 '12 at 11:09
  • In my experience, that's iffy at best. However, last time I tried was in 2009. Perhaps it's improved. – Theodore R. Smith Feb 21 '12 at 16:13