5

When I use DOMDocument::loadXML() for my XML below I get error:

Warning: DOMDocument::loadXML() [domdocument.loadxml]: CData section not finished http://www.site.org/displayimage.php?album=se in Entity,
Warning: DOMDocument::loadXML() [domdocument.loadxml]: Premature end of data in tag image line 7 in Entity
Warning: DOMDocument::loadXML() [domdocument.loadxml]: Premature end of data in tag quizz line 3 in Entity
Warning: DOMDocument::loadXML() [domdocument.loadxml]: Premature end of data in tag quizzes line 2 in Entity
Fatal error: Call to a member function getElementsByTagName() on a non-object 

It seems to me that my CData sections are closed but still I get this error. XML looks like this:

<?xml version="1.0" encoding="utf-8"?>
<quizzes>
<quizz>
<title><![CDATA[Title]]></title>
<descr><![CDATA[Some text here!]]></descr>
<tags><![CDATA[one tag, second tag]]></tags>
<image><![CDATA[http://www.site.org/displayimage.php?album=search&cat=0&pos=1]]></image>
<results>
<result>
<title><![CDATA[Something]]></title>
<descr><![CDATA[Some text here]]></descr>
<image><![CDATA[http://www.site.org/displayimage.php?album=search&cat=0&pos=17]]></image>
<id>1</id>
</result>
</results>
</quizz>
</quizzes>

Could you help me discover what is the problem?

Tom Smykowski
  • 25,487
  • 54
  • 159
  • 236
  • "XML looks like this" - "looks like" or "exactly is"? And if you don't want us to see the actual url you might want to change not only the document but the error message as well. – VolkerK May 09 '10 at 16:30
  • @VolkerK Thanks, i don't want to advert any website here. – Tom Smykowski May 10 '10 at 14:48

5 Answers5

11

I found that usually there are problems with hidden XML chars, so I prefer escape invalid chars like beloved:

<?php
//$feedXml is the fetched XML content
$invalid_characters = '/[^\x9\xa\x20-\xD7FF\xE000-\xFFFD]/';
$feedXml = preg_replace($invalid_characters, '', $feedXml );
lenI
  • 119
  • 1
  • 4
2

Sorry if this is off topic because it is only related to a specific case with PHP when using cURL but, as tomaszs states, I too discovered that ampersands can cause a problem when passing XML via cURL in PHP. I had been receiving a known valid XML string with ampersands properly encoded and was then forwarding it to another address with cURL. Something like this...

$curlHandle = curl_init();
curl_setopt($curlHandle, CURLOPT_URL,            $fullUri);
curl_setopt($curlHandle, CURLOPT_HEADER,         false);
curl_setopt($curlHandle, CURLOPT_RETURNTRANSFER, true);
curl_setopt($curlHandle, CURLOPT_CONNECTTIMEOUT, 4); // seconds
curl_setopt($curlHandle, CURLOPT_POST,           true);
curl_setopt($curlHandle, CURLOPT_POSTFIELDS,     "xmlstr=" . $xmlstr); // Problem

The issue occurs in the last line above when adding the XML to CURLOPT_POSTFIELDS. The first encoded ampersand gets seen as a delimiter for a parameter, as in a querstring, and the "xmlstr" variable/field is truncated.

The solution I used was to replace the last line above with...

curl_setopt($curlHandle, CURLOPT_POSTFIELDS,     "xmlstr=" . urlencode($xmlstr));

Hope this helps someone.

Night Owl
  • 4,198
  • 4
  • 28
  • 37
0

The answers here have the right idea: There is some sort of bad, possibly non-printing, character in the document, which breaks the parser. None of the answers above solved my problem, instead I used tr to write a "clean" version of the file and then I was able to parse that, ie,

<?php
try {
    $simpleXMLobject = simplexml_load_file($feed);
} catch (\Exception $ex) {
    //try to clean the file and reload it
    $tempFile = sys_get_temp_dir() . "/" . uniqid("rdc");
    shell_exec(
        "tr -cd '\11\12\15\40-\176' < " .
        escapeshellarg($feed) . " > " .
        escapeshellarg($tempFile)
    );
    try {
        $simpleXMLobject = simplexml_load_file($tempFile);
    } catch (\Exception $ex) {
        $err = $ex->getTraceAsString();
        echo die($err);
    }
}
Community
  • 1
  • 1
chiliNUT
  • 18,989
  • 14
  • 66
  • 106
-2

I don't see any error (either the actually used XML is different form the provided, or the xml processor used (BTW, what is it?) is buggy).

I would recommend to avoid using CDATA sections. Use the following XML document, which is the same as (text-equivalent to) the provided, and much more readable:

<quizzes>
   <quizz>
      <title>Title</title>
      <descr>Some text here!</descr>
      <tags>one tag, second tag</tags>
      <image>http://www.site.org/displayimage.php?album=search&amp;cat=0&amp;pos=1</image>
      <results>
         <result>
            <title>Something</title>
            <descr>Some text here</descr>
            <image>http://www.site.org/displayimage.php?album=search&amp;cat=0&amp;pos=17</image>
            <id>1</id>
         </result>
      </results>
   </quizz>
</quizzes>
Dimitre Novatchev
  • 240,661
  • 26
  • 293
  • 431
-2

I 've found that the problem was with passing this XML in PHP with cURL. I've sent it as normal text, and & char in this XML was interpreted as delimiter to next parameter. So when I escaped this char it started to work properly.

Tom Smykowski
  • 25,487
  • 54
  • 159
  • 236