I've a problem. I want to load a HTML snippet with namespaces in it with DOMDocument
.
<div class="something-first">
<div class="something-child something-good another something-great">
<my:text value="huhu">
</div>
</div>
But I can't figure out how to preserve the namespaces. I tried loading it with loadHTML()
but HTML does not have namespaces and so they get stripped.
I tried loading it with loadXML()
but this doesn't work neither cause <my:text value="huhu">
is not correct XML.
What I need is a loadHTML()
method which doesn't strip namespaces or a loadXML()
method which does not validate the markup. So a combination of this two methods.
My code so far:
$html = '<div class="something-first">
<div class="something-child something-good another something-great">
<my:text value="huhu">
</div>
</div>';
libxml_use_internal_errors(true);
$domDoc = new DOMDocument();
$domDoc->formatOutput = false;
$domDoc->resolveExternals = false;
$domDoc->substituteEntities = false;
$domDoc->strictErrorChecking = false;
$domDoc->validateOnParse = false;
$domDoc->loadHTML($html/*, LIBXML_NOERROR | LIBXML_NOWARNING*/);
$xpath = new DOMXPath($domDoc);
$xpath->registerNamespace ( 'my', 'http://www.example.com/' );
// -----> This results in zero nodes cause namespace gets stripped by loadHTML()
$nodes = $xpath->query('//my:*');
var_dump($nodes);
Is there a way to achieve what I want? I would be very happy for any advices.
EDIT I opened an enhancment request for libxml2 to provide an option to preserve namespaces in HTML: https://bugzilla.gnome.org/show_bug.cgi?id=711670