1

I have a unique situation I'd love some help on if there are any XML experts reading this.

I need to insert an HTML escaped, XML payload into an XML request (my god, don't ask!). The issue Im having is that the Symfony XML encoder wants to add CDATA to my xmlRequest node because they contain "&" characters when the payload is escaped.

    $requestData = ["Foo" => "Bar"];
    $xmlArray = [
        "@xmlns:soap12" => "http://www.w3.org/2003/05/soap-envelope",
        "soap12:Body" => [
            "Login" => [
                "@xmlns" => "http://www.ahsl.com/",
                "xmlRequest" => htmlspecialchars($requestData),
            ]
        ]
    ];

    $encoders = [new XmlEncoder()];
    $serializer = new Serializer([], $encoders);

    return $serializer->serialize(
        $xmlArray,
        XmlEncoder::ROOT_NODE_NAME => "soap12:Envelope"
        XmlEncoder::FORMAT
        );

Produces (I had to cut out sensitive information so this may not be perfect, its just an example):

<?xml version="1.0" encoding="UTF-8"?>
<soap12:Envelope xmlns:soap12="http://www.w3.org/2003/05/soap-envelope"><soap12:Body><xmlRequest><![CDATA[&lt;Foo&gt;&lt;value&gt;Bar&lt;/value&gt;]]></xmlRequest></Login></soap12:Body></soap12:Envelope>

I need to insert that $requestData without the CDATA. So that it looks like this:

<?xml version="1.0" encoding="UTF-8"?>
<soap12:Envelope xmlns:soap12="http://www.w3.org/2003/05/soap-envelope"><soap12:Body><xmlRequest><!&lt;Foo&gt;&lt;value&gt;Bar&lt;/value&gt;></xmlRequest></Login></soap12:Body></soap12:Envelope>

The issue is partially a bug and partially the library saying "we didnt expect anyone to do this madness".

Within XMLEncoder library is this block:

    /**
     * Checks if a value contains any characters which would require CDATA wrapping.
     */
    private function needsCdataWrapping(string $val): bool
    {
        return preg_match('/[<>&]/', $val);
    }

The library thinking it need to add CDATA because htmlspecialchars is inject an &, so it thinks it needs to add CDATA because its seeing XML, when actually its not. I wonder if anyone else has found a hack to get around this, or maybe theres a different flow, or a fork I could use to insert the data after the encoding by XMLEncoder because as far as I have tried, there is no way I can trick this library into not inserting that CDATA.

RonnyKnoxville
  • 6,166
  • 10
  • 46
  • 75
  • It was clearly made for general purpose, and that function is hard-coded in, so if there are any ampersands, it adds a CDATA. While technically a single ampersand is not valid XML unless it's in a CDATA tag, an escaped character sequence is valid without. Just use another library like `DOMDocument` or `SimpleXML`. – Jim Oct 03 '22 at 20:11
  • This is what I ended up doing for now. Thanks for confirming my thoughts on this. My plan was to extend the library to make CDATA toggleable but that would require forking the library, so `SimpleXML` it is. – RonnyKnoxville Oct 04 '22 at 08:28

0 Answers0