0

I'm using Curl to execute a soap request. Now it looks like there is a mistake returned in the headers that prevents me from turning the returned string into a simplexml object with the function simplexml_load_string. Below you can find the part of the response that fails in the simplexml function:

<SOAP-ENV:Envelope xmlns:SOAP-ENV="http://schemas.xmlsoap.org/soap/envelope/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema"><SOAP-ENV:Header><SOAP-SEC:Signature xmlns:SOAP-SEC="http://schemas.xmlsoap.org/soap/security/2000-12"><ds:Signature xmlns:ds="http://www.w3.org/2000/09/xmldsig#"><ds:SignedInfo><ds:SignatureMethod Algorithm="http://www.w3.org/2000/09/xmldsig#dsa-sha1"/><ds:Reference URI="#Body"><ds:DigestMethod Algorithm="http://www.w3.org/2000/09/xmldsig#sha1"/><ds:DigestValue>HV+/cOkUjNCdH5xuiLlGSHVgkUo=</ds:DigestValue></ds:Reference><ds:SignatureValue>MCwCFHXmoMrDUOScwMQ5g76OfxouICjBAhQtGKAorJLUQ0bA0UaKIe1gtmQPgA==</ds:SignatureValue></ds:SignedInfo></ds:Signature></SOAP-SEC:Signature></SOAP-ENV:Header><SOAP-ENV:Body xmlns:SOAP-SEC="http://schemas.xmlsoap.org/soap/security/2000-12" SOAP-SEC:id="Body">

Is there a way to isolate the soap body content and parsing only that part with the simplexml_load_string?

Below the curl request:

$headers = array(
              "Content-type: text/xml;charset=\"utf-8\"",
              "Accept: text/xml",
              "Cache-Control: no-cache",
              "Pragma: no-cache",
              "Content-length: ".strlen($xml_post_string),
          ); 

          $url = $soapUrl;

          $ch = curl_init();
          curl_setopt($ch, CURLOPT_URL, $url);
          curl_setopt($ch, CURLOPT_POST, true);
          curl_setopt($ch, CURLOPT_POSTFIELDS, $xml_post_string);
          curl_setopt($ch, CURLOPT_HTTPHEADER, $headers);
          curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
          curl_setopt($ch, CURLOPT_HEADER, 0);
          curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, 0);

          $response = curl_exec($ch); 
          curl_close($ch);
$xml = simplexml_load_string(html_entity_decode($response), 'SimpleXMLElement', LIBXML_NOCDATA);

        echo $xml->asXML();

        if ($xml === false) {
          echo "Failed to load XML: ";

          foreach(libxml_get_errors() as $error) {
            echo "<br>", $error->message;
          }
        } else {
          var_dump($xml);
        }












Frank W.
  • 777
  • 3
  • 14
  • 33
  • **[You should not switch off `CURLOPT_SSL_VERIFYHOST` or `CURLOPT_SSL_VERIFYPEER`](https://paragonie.com/blog/2017/10/certainty-automated-cacert-pem-management-for-php-software)**. It could be a security risk! [Here is how to get the certificate bundle if your server is missing one](https://stackoverflow.com/a/32095378/1839439) – Dharman Jun 19 '20 at 14:51
  • Hello @Dharman, thanks for your answer. I will turn both curl options back on. Do you also know how I can fix the response? Or is this just an error from the supplier? Thanks again! – Frank W. Jun 22 '20 at 07:15
  • Are you getting an actual error message? The XML is well-formed as far as I can tell, and all element namespaces appear to line up at a quick glance. – Chris Haas Jun 23 '20 at 13:07
  • Hi @ChrisHaas, thanks for your response. Unfortunately, the return of simplexml_load_string stays empty. When I manually remove the envelope tag, headers and body tag it works like a charm. Unfortunately these elements get returned with every response. – Frank W. Jun 24 '20 at 07:30

2 Answers2

1

I don't have an answer for you right now, but you first need to separate curl from XML processing. You should start with logging your result from curl and making sure it is sane and what you expect. If it is, then move on to parsing it. curl should never break/change your data in any way, but the request itself (headers, etc.) might change the server's response.

Since I can't validate your server, I'm just going to go off of what you've provided. I've closed the <SOAP-ENV:Body> tag and converted the XML to readable, but otherwise it is untouched. This code parses the XML without a problem and then emits it exactly as expected.

$response = <<<'TAG'
<SOAP-ENV:Envelope xmlns:SOAP-ENV="http://schemas.xmlsoap.org/soap/envelope/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
    <SOAP-ENV:Header>
        <SOAP-SEC:Signature xmlns:SOAP-SEC="http://schemas.xmlsoap.org/soap/security/2000-12">
            <ds:Signature xmlns:ds="http://www.w3.org/2000/09/xmldsig#">
                <ds:SignedInfo>
                    <ds:SignatureMethod Algorithm="http://www.w3.org/2000/09/xmldsig#dsa-sha1" />
                    <ds:Reference URI="#Body">
                        <ds:DigestMethod Algorithm="http://www.w3.org/2000/09/xmldsig#sha1" />
                        <ds:DigestValue>HV+/cOkUjNCdH5xuiLlGSHVgkUo=</ds:DigestValue>
                    </ds:Reference>
                    <ds:SignatureValue>MCwCFHXmoMrDUOScwMQ5g76OfxouICjBAhQtGKAorJLUQ0bA0UaKIe1gtmQPgA==</ds:SignatureValue>
                </ds:SignedInfo>
            </ds:Signature>
        </SOAP-SEC:Signature>
    </SOAP-ENV:Header>
    <SOAP-ENV:Body xmlns:SOAP-SEC="http://schemas.xmlsoap.org/soap/security/2000-12" SOAP-SEC:id="Body"></SOAP-ENV:Body>
</SOAP-ENV:Envelope>
TAG;

$xml = simplexml_load_string(html_entity_decode($response), 'SimpleXMLElement', LIBXML_NOCDATA);

echo '<pre>';
print_r(htmlspecialchars($xml->asXML()));
echo '</pre>';

The output is exactly the same as the input except it includes the XML directive and converts the body tag to self-closing:

<?xml version="1.0"?>
<SOAP-ENV:Envelope xmlns:SOAP-ENV="http://schemas.xmlsoap.org/soap/envelope/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
    <SOAP-ENV:Header>
        <SOAP-SEC:Signature xmlns:SOAP-SEC="http://schemas.xmlsoap.org/soap/security/2000-12">
            <ds:Signature xmlns:ds="http://www.w3.org/2000/09/xmldsig#">
                <ds:SignedInfo>
                    <ds:SignatureMethod Algorithm="http://www.w3.org/2000/09/xmldsig#dsa-sha1"/>
                    <ds:Reference URI="#Body">
                        <ds:DigestMethod Algorithm="http://www.w3.org/2000/09/xmldsig#sha1"/>
                        <ds:DigestValue>HV+/cOkUjNCdH5xuiLlGSHVgkUo=</ds:DigestValue>
                    </ds:Reference>
                    <ds:SignatureValue>MCwCFHXmoMrDUOScwMQ5g76OfxouICjBAhQtGKAorJLUQ0bA0UaKIe1gtmQPgA==</ds:SignatureValue>
                </ds:SignedInfo>
            </ds:Signature>
        </SOAP-SEC:Signature>
    </SOAP-ENV:Header>
    <SOAP-ENV:Body xmlns:SOAP-SEC="http://schemas.xmlsoap.org/soap/security/2000-12" SOAP-SEC:id="Body"/>
</SOAP-ENV:Envelope>

So use this as a baseline. Write your curl response to a text file before doing anything else, and then read that text file back in and perform logic. Any transformation you apply to the string XML should also be logged and compared to make sure it is doing what you expected. On production you'd skip that but this just helps during the debugging.

Also, I'm not really sure what the point of html_entity_decode is in this. If you are receiving XML (as your request mime type specifies) then it shouldn't have any escape sequences applied to it, but maybe you have an exceptional case, too.

Chris Haas
  • 53,986
  • 12
  • 141
  • 274
0

Just to give some example XML content, this will vary for any file but just shows how you can access the data...

<SOAP-ENV:Body
    xmlns:SOAP-SEC="http://schemas.xmlsoap.org/soap/security/2000-12"
    SOAP-SEC:id="Body">
        <BodyContent>SomeData</BodyContent>
        <OtherContent>2</OtherContent>
</SOAP-ENV:Body>

Then it would be a case of using XPath to find the <SOAP-ENV:Body> tag

$xml->registerXPathNamespace("SOAP-ENV", "http://schemas.xmlsoap.org/soap/envelope/");
$bodyBlock = $xml->xpath("//SOAP-ENV:Body")[0];

(note that as xpath() returns a list of matches, using [0] just uses the first one).

This next part depends on the message being processed, but as the example I gave has child elements with no namespace prefix, then you can extract these using ->children() and this eases access to the contents. The main part is that at this point the $bodyBlock contains this...

<SOAP-ENV:Body xmlns:SOAP-SEC="http://schemas.xmlsoap.org/soap/security/2000-12" SOAP-SEC:id="Body">
        <BodyContent>SomeData</BodyContent>
        <OtherContent>2</OtherContent>
</SOAP-ENV:Body>

So to put that together in your original code...

$xml = simplexml_load_string($response, 'SimpleXMLElement', LIBXML_NOCDATA);

if ($xml === false) {
    echo "Failed to load XML: ";
    
    foreach(libxml_get_errors() as $error) {
        echo "<br>", $error->message;
    }
} else {
    // Search for the Body element (this is in the SOAP-ENV namespace)
    $xml->registerXPathNamespace("SOAP-ENV", "http://schemas.xmlsoap.org/soap/envelope/");
    $bodyBlock = $xml->xpath("//SOAP-ENV:Body")[0];
    
    // If the content does not have a namespace, extract the children from the default namespace
    $body = $bodyBlock->children();
    
    // You can now access the content.
    echo $body->BodyContent.PHP_EOL;
    echo $body->OtherContent;
    
}

which outputs the two values in the body....

SomeData
2
Nigel Ren
  • 56,122
  • 11
  • 43
  • 55