7

For exemple, i create a DOMDocument like that :

<?php

$implementation = new DOMImplementation();

$dtd =
  $implementation->createDocumentType
  (
    'html',                                     // qualifiedName
    '-//W3C//DTD XHTML 1.0 Transitional//EN',   // publicId
    'http://www.w3.org/TR/xhtml1/DTD/xhtml1-'
      .'transitional.dtd'                       // systemId
  );

$document = $implementation->createDocument('', '', $dtd);

$elementHtml     = $document->createElement('html');
$elementHead     = $document->createElement('head');
$elementBody     = $document->createElement('body');
$elementTitle    = $document->createElement('title');
$textTitre       = $document->createTextNode('My bweb page');
$attrLang        = $document->createAttribute('lang');
$attrLang->value = 'en';

$document->appendChild($elementHtml);
$elementHtml->appendChild($elementHead);
$elementHtml->appendChild($attrLang);
$elementHead->appendChild($elementTitle);
$elementTitle->appendChild($textTitre);
$elementHtml->appendChild($elementBody);

So, now, if i have some xhtml string like that :

<?php
$xhtml = '<h1>Hello</h1><p>World</p>';

How can i import it in the <body> node of my DOMDocument ?

For now, the only solution I've found, is something like that :

<?php
$simpleXmlElement = new SimpleXMLElement($xhtml);

$domElement = dom_import_simplexml($simpleXmlElement);

$domElement = $document->importNode($domElement, true);

$elementBody->appendChild($domElement);

This solution seems very bad for me, and create some problemes, like when I try with a string like that :

<?php
$xhtml = '<p>Hello&nbsp;World</p>';

Ok, I can bypass this problem by converting xhtml entities in Unicode entities, but it's so ugly...

Any help ?

Thanks by advance !

Related question :

Community
  • 1
  • 1
Pascal Qyy
  • 4,442
  • 4
  • 31
  • 46

2 Answers2

9

The problem is DOM does not know that it should consider the XHTML DTD unless you validated the document against it. Unless you do that, DOM doesnt know any entities defined in the DTD, nor any other rules in it. Fortunately, we sorted out how to do the validation in that other question, so armed with that knowledge you can do

$document->validate(); // anywhere before importing the other DOM

And then import with

$fragment = $document->createDocumentFragment();
$fragment->appendXML('<h1>Hello</h1><p>Hello&nbsp;World</p>');
$document->getElementsByTagName('body')->item(0)->appendChild($fragment);
$document->formatOutput = TRUE;
echo $document->saveXml();

outputs:

<?xml version="1.0"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" lang="en" xml:lang="en">
  <head>
    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
    <title>My bweb page</title>
  </head>
  <body>
    <h1>Hello</h1>
    <p>Hello&nbsp;World</p>
  </body>
</html>

The other way to import XML into another DOM is to use

$one = new DOMDocument;
$two = new DOMDocument;
$one->loadXml('<root><foo>one</foo></root>');
$two->loadXml('<root><bar><sub>two</sub></bar></root>');
$bar = $two->documentElement->firstChild; // we want to import the bar tree
$one->documentElement->appendChild($one->importNode($bar, TRUE));
echo $one->saveXml();

outputs:

<?xml version="1.0"?>
<root><foo>one</foo><bar><sub>two</sub></bar></root>

However, this cannot work with

<h1>Hello</h1><p>Hello&nbsp;World</p>

because when you load a document into DOM, DOM will overwrite everything you told it before about the document. Thus, when using load, libxml (and thus SimpleXml, DOM and XMLReader) does (do) not know you mean XHTML. And it does not know any entities defined in it and will fuzz about them instead. But even if the string would not contain the entity, it is not valid XML, because it lacks a root node. That's why you use the fragment.

Community
  • 1
  • 1
Gordon
  • 312,688
  • 75
  • 539
  • 559
  • 1
    It's ok, I can import after validate my document, but it mean that if I want to use a DOMDocument template instead of a text template, I have to validate EACH TIME I insert some content / each time i serve a page, so, request a DTD who is always the same because PHP put any care of HTTP Stats, and I'll flood w3.org... – Pascal Qyy Nov 03 '10 at 07:17
  • 1
    @G. Qyy set the HTTP stream context to use a caching Proxy then. – Gordon Nov 03 '10 at 08:00
  • I'm actually interested by this issue, but create a caching proxy isn't it a rather heavy implementation to such a problem? (excuse my poor English) – Service Informatique Nov 03 '10 at 08:27
  • @Service that depends on your setup. On a small shared hosting site, it might be too heavy (and not even possible), but using a reverse/caching proxy to serve static content on medium to large sites can seriously increase performance of your application. - On a completely unrelated topic: do you have any suggestions for a cozy french cuisine restaurant in Paris in the mid-price range? – Gordon Nov 03 '10 at 08:38
  • 2
    @Service and @G. Qyy: another option might be to [unregister](http://de.php.net/manual/en/function.stream-wrapper-unregister.php) the [regular HTTP](http://de.php.net/manual/en/wrappers.php) [stream wrapper](http://de.php.net/manual/en/book.stream.php) and replace it with your own implementation that reads the DTD from a local copy. – Gordon Nov 03 '10 at 09:14
  • @Gordon Sorry, i hate restaurants: it's expensive for a so random quality... In addition, there are people and noise, and it is never on the hygiene! – Service Informatique Nov 03 '10 at 10:15
  • @Gordon "option might be to unregister the regular HTTP stream wrapper[...]": any idea for how to do that (PHP is not my favorite language, i'm not at ease with it)? – Service Informatique Nov 03 '10 at 10:18
  • @Service Sounds like I should reconsider dining in Paris :) But back on topic: I'm not familiar with writing stream wrappers, so I'd have to look into that first. But have a look at that [example from the PHP Manual](http://de.php.net/manual/en/stream.streamwrapper.example-1.php) – Gordon Nov 03 '10 at 10:38
  • @Gordon Thank you, but I hop there's a simpler solution... And for the restaurant, this is my opinion whatever the city or the country! – Service Informatique Nov 03 '10 at 10:48
  • 1
    @Service you could use a systemId that points to a local file instead of the W3C URL. That will only work for documents you created during the request though. Unlike for NG and Schema validation, there is no validateSource function that would allow you to pass in a DTD in a string. You could validate against the [XHTML schema](http://www.w3.org/TR/xhtml1-schema/) but schemas do not include entities. – Gordon Nov 03 '10 at 11:03
  • 1
    Ok, so, after some research, it's impossible to import an XML string without either validate the DOMDocument, or convert the XHTML entities to Unicode. So I choose the least burdensome solution: convert the entities with this function http://pastebin.com/6ADYFLHa. Thank you very much for your help – Pascal Qyy Nov 04 '10 at 07:21
1

You can use a DomDocumentFragment for this:

$fragment = $document->createDocumentFragment();
$fragment->appendXml($xhtml);
$elementBody->appendChild($fragment);

That's all there is to it...

Edit: Well, if you must have xhtml (instead of valid xml), you could do this dirty workaround:

function xhtmlToDomNode($xhtml) {
    $dom = new DomDocument();
    $dom->loadHtml('<html><body>'.$xhtml.'</body></html>');
    $fragment = $dom->createDocumentFragment();
    $body = $dom->getElementByTagName('body')->item(0);
    foreach ($body->childNodes as $child) {
        $fragment->appendChild($child);
    }
    return $fragment;
}

usage:

$fragment = xhtmlToDomNode($xhtml);
$document->importNode($fragment, true);
$elementBody->appendChild($fragment);
ircmaxell
  • 163,128
  • 34
  • 264
  • 314
  • 1
    Unlucky, Same probleme that with SimpleXmlElement : "Warning: DOMDocumentFragment::appendXML(): Entity: line 1: parser error : Entity 'nbsp' not defined" – Pascal Qyy Nov 02 '10 at 19:05
  • @G. Qyy: Edited answer with another possible solution – ircmaxell Nov 02 '10 at 19:12
  • 1
    XHTML with a well declared DTD IS valid XML, and your solution shunt the DOMImplementation with DTD declaration, sorry... Thanks a lot for your willing to help me, anything else ? – Pascal Qyy Nov 02 '10 at 19:21