You consider any element node without a child empty //*[not(node())]
will accomplish that. But if it removes the element nodes it can result in additional empty nodes, so you will need an expression that does not only remove the currently empty element nodes, but these with only empty descendant nodes (recursively). Additionally you might want to avoid to remove the document element even if it is empty because that would result in an invalid document.
Building up the expression
- Select the document element
/*
- Any descendant of the document element
/*//*
- ...with only whitespaces as text content (this includes descendants)
/*//*[normalize-space(.) = ""]
- ...and no have attributes
/*//*[normalize-space(.) = "" and not(@*)]
- ...or an descendants with attributes
/*//*[normalize-space(.) = "" and not(@* or .//*[@*])]
- ...or a comment
/*//*[normalize-space(.) = "" and not(@* or .//*[@*] or .//comment())]
- ...or a pi
/*//*[
normalize-space(.) = "" and not(@* or .//*[@*] or .//comment() or .//processing-instruction())
]
Put together
Iterate the result in reverse order, so that child nodes are deleted before parents.
$xmlString = <<<'XML'
<foo>
<empty/>
<empty></empty>
<bar><empty/></bar>
<bar attr="value"><empty/></bar>
<bar>text</bar>
<bar>
<empty/>
text
</bar>
<bar>
<!-- comment -->
</bar>
</foo>
XML;
$xml = new SimpleXMLElement($xmlString);
$xpath = '/*//*[
normalize-space(.) = "" and
not(
@* or
.//*[@*] or
.//comment() or
.//processing-instruction()
)
]';
foreach (array_reverse($xml->xpath($xpath)) as $remove) {
unset($remove[0]);
}
echo $xml->asXml();
Output:
<?xml version="1.0"?>
<foo>
<bar attr="value"/>
<bar>text</bar>
<bar>
text
</bar>
<bar>
<!-- comment -->
</bar>
</foo>