0

I have a XML file that has HTML in it. Within the HTML tags there are attributes I'd like to remove, yet I need to keep all of the tags. An example would be:

<description><![CDATA[<div><span style='font-size: 40px'>Testing123</span></div>]]></description>

I'd like to remove the 'style' attribute so that the output is:

 <description><![CDATA[<div><span>Testing123</span></div>]]></description>

I was sort of able to get this working using preg_replace, but then the formatting was way off when I went to save the file. In other words, I want to preserve the formatting of the XML file after the parsing/stripping of my file.

EDIT: The initial sample data I provided doesn't include CDATA that is within my XML file. I modified that.

user3452136
  • 125
  • 1
  • 1
  • 11

3 Answers3

1

I'm not sure about the formatting, but try using simplexml and the unset() function:

$string = "<div><span style='font-size: 40px'>Testing123</span></div>";                                                                       
$xml = simplexml_load_string($string);
$target = $xml->xpath("//span/@style");
foreach ($target as $node) {
    unset($node[0]);
}

echo $xml->asXML();

Output:

<?xml version="1.0"?>
<div><span>Testing123</span></div>
Jack Fleeting
  • 24,385
  • 6
  • 23
  • 45
0

XSLT 3.0 solution:

<transform xmlns="http://www.w3.org/1999/XSL/Transform" version="3.0"/>
 <mode on-no-match="shallow-copy"/>
 <template match="span/@style"/>
</transform>

You can extend the match pattern, or add additional template rules, depending which attributes you want to remove.

Of course XSLT 1.0 is possible as well, it's just a bit more verbose.

Michael Kay
  • 156,231
  • 11
  • 92
  • 164
0

You have to load the string from the cdata element as XML (more correct: HTML) and then remove all the attributes, for example with xpath:

$xml = simplexml_load_string($buffer);
$cdata = simplexml_load_string((string)$xml);
foreach ($cdata->xpath('@*[name(.) != "id"]|*/@*') as $attribute) {
    unset($attribute[0]);
}

(compare: an answer to Remove a child with a specific attribute, in SimpleXML for PHP)

Then, if you need to preserve the CDATA element, you have to do a DOM-round-trip as SimpleXML does not have CDATA sections, but DOMDocument has:

    if ($n = dom_import_simplexml($el)) {
        $cd = $n->ownerDocument->createCDATASection($data);
        $n->appendChild($cd);
    }

(compare: How to write CDATA using SimpleXmlElement?; check the PHP manual for even more details)

You also need to create the XML from $cdata but you want to drop the first line as it contains the XML declaration.

    rtrim(explode("\n", $el->asXML(), 2)[1]);

(compare: SimpleXML Line Separator in SimpleXML Type Cheatsheet)

Given an input like:

$buffer=<<<XML
<description><![CDATA[<div id="2" style="all: inherit;"><span style='font-size: 40px'>Testing123</span></div>]]></description>
XML;

the result is:

<?xml version="1.0"?>
<description><![CDATA[<div id="2"><span>Testing123</span></div>]]></description>

Example on 3v4l.org; Output for 7.3.0 - 7.3.29, 7.4.0 - 7.4.21, 8.0.0 - 8.0.8:

<?xml version="1.0"?>
<description><![CDATA[<div id="2"><span>Testing123</span></div>]]></description>
hakre
  • 193,403
  • 52
  • 435
  • 836