2

Alright. So I'm using DOMDocument to read html files. One thing I've noticed is that when I do this

$doc = new DOMDocument();
$doc->loadHTML($htmlstring);
$doc->saveHTML();

it will add on a doctype header, and html and body tags.

I've gotten around this by doing this

$doc = new DOMDocument();
$doc->loadXML($htmlstring,LIBXML_NOXMLDECL);
$doc->saveXML();

The problem with this however is the fact that now all my tags are case sensitive, and it gets mad if I have more than one document root.

Is there an alternative so that I can load up partial html files, grab tags and such, replace them, and get the string without having to parse the files manually?

Basically I want the functionallity of DOMDocument->loadHTML, without the added tags and header.

Any ideas?

Kelly Elton
  • 4,373
  • 10
  • 53
  • 97

2 Answers2

2

In theory you could tell libxml not to add the implied markup. In practise, PHP's libxml bindings do not provide any means to that. If you are on PHP 5.3.6+ pass the root node of your partial document to saveHTML()which will then give you the outerHTML of that element, e.g.

$dom->saveHTML($dom->getElementsByTagName('body')->item(0));

would only return the <body> element with children. See

Also note that your partial document with multiple root elements only works because loadHTML adds the implied elements. If you want a partial with multiple roots (or rather no root at all) back, you can add a fake root yourself:

$dom->loadHTML('<div id="partialroot">' . $partialDoc . '</div>');

Then process the document as needed and then fetch the innerHTML of that fake root

Also see How do you parse and process HTML/XML in PHP? for additional parsers you might want to try

Community
  • 1
  • 1
Gordon
  • 312,688
  • 75
  • 539
  • 559
0

You can use some divs with specific id, and then from the document object, partially extract the div object using its id.

Kris
  • 8,680
  • 4
  • 39
  • 67