PHP XML DOM parseing mixed content

Question

I have an XML document which is well defined with an XSD file. The xml document contains content similar to the below:

<foo>
   <bar>text <element a="1" b="2" c="3" /> and some more text</bar>
   <bar>Just text</bar>
</foo>

I was wanting to use PHP to parse it to just bring back one of the attribute values (which will be decided elsewhere in the code) inline with the rest of the text, for this example I would want "B" and the output should be:

"text 2 and some more text"
"Just text"

I am having an issue getting the output in this format as I cannot find a way to either split the nodes text so that I can insert the attribute value or output the pure xml of the node.

My preference would be to use PHP's DOMDocument method to do this. While I have not learnt XPath I would be willing to learn it, if it would make this task possible. I would also consider changing the format of the nested node although this would be a last resort.

I am using DOMdocument to find the node:

$xml= new DOMDocument();
$xml->load(XMLPATH); 
$node = $xml->getElementsByTagName("element")->item(0);

Then all of the following ignore the nested element:

$node->nodeValue;
$node->C14N();

I have also followed this guide to no avail: How to get innerHTML of DOMNode?

Thanks for your help.

I am using DOMdocument to find the node: $xml= new DOMDocument(); $xml->load(*XMLPATH*); $node = $xml->getElementsByTagName("element")->item(0) Then all of the following ignore the nested element: $node->nodeValue $node->C14N(); I have also followed this guide to no avail: http://stackoverflow.com/questions/2087103/how-to-get-innerhtml-of-domnode — user2502611, Jan 17 '17 at 18:06

score 0 · Accepted Answer · answered Jan 17 '17 at 18:21

0

You can use XPath to select the text() nodes and @b to select the attribute and the union operator | will bring all in the right order:

$xml = <<<EOD
<foo>
   <bar>text <element a="1" b="2" c="3" /> and some more text</bar>
   <bar>Just text</bar>
</foo>
EOD;

$doc = new DOMDocument();
$doc->loadXML($xml);

$xpath = new DOMXPath($doc);
$nodeList = $xpath->query('//foo//text() | //foo//element/@b', $doc);

$result = '';

for ($i = 0; $i < $nodeList->length; $i++) {
    $result .= $nodeList[$i]->textContent;
}
echo $result;

Result is

   text 2 and some more text
   Just text

answered Jan 17 '17 at 18:21

Martin Honnen

160,499
6
90
110

Wow such a simple solution, I guess I will have to learn Xpath is now. As an additional question would it be faster to navigate a domdocument using xpath as opposed to getelement methods, if so I am tempted to revamp the rest of the site. – user2502611 Jan 17 '17 at 18:35
Should the query be around bar not foo, I.E '//bar//text() | //bar//element/@b'. Sorry if this is incorrect as I am trying to pick up XPath – user2502611 Jan 17 '17 at 19:21
Given the input snippet, the use of `//foo//text() | //foo//element/@b` will include the white space text nodes before, between and after the `bar` elements, I don't know whether you want them, if you use just `//bar//text() | //bar//element/@b` the result is only `text 2 and some more textJust text`. – Martin Honnen Jan 17 '17 at 19:38
Ah thanks that makes sense, I forgot that the foo nodes would contain text. I was processing them into a list so I will use the //foo approach to avoid the plain nodes. – user2502611 Jan 17 '17 at 20:16

score 0 · Answer 2 · answered Jan 17 '17 at 18:22

The following code should give you an idea of how to achieve your goal without using XPath:

<?php
$xml = '<foo>
    <bar>text <element a="1" b="2" c="3" /> and some more text</bar>
    <bar>Just text</bar>
</foo>'; // Your example XML.

$attr = 'b'; // Attribute of <element> you are interested in.

$doc = new DOMDocument();
$doc->loadXml($xml);

foreach($doc->documentElement->getElementsByTagName('bar') as $bar)
{
    $text = '';
    foreach($bar->childNodes as $child)
    {
        switch($child->nodeType)
        {
        case XML_ELEMENT_NODE:
            if($child->nodeName == 'element')
                $text .= $child->getAttribute($attr);
            break;
        case XML_TEXT_NODE:
            $text .= $child->textContent;
            break;
        }
    }
    echo $text . PHP_EOL;
}

Will this solution have all of the text content in a single XML_TEXT_NODE or will the nested element split the text into 2 XML_TEXT_NODEs? — user2502611, Jan 17 '17 at 18:26
@user2502611 XML DOM treats the content of first element in your example as consisting of three DOM nodes: XML_TEXT_NODE ("text "), XML_ELEMENT_NODE (XML element ) and XML_TEXT_NODE (" and some more text"). — PowerGamer, Jan 17 '17 at 18:36

PHP XML DOM parseing mixed content

2 Answers2