108

I am reading an xml in php using simplexml_load_file. However while trying to load the xml it displays a list of warnings

Warning: simplexml_load_file() [function.simplexml-load-file]: <project orderno="6" campaign_name="International Relief & Development" project in /home/bluecard1/public_html/test.php on line 3    
Warning: simplexml_load_file() [function.simplexml-load-file]: ^ in /home/bluecard1/public_html/test.php on line 3    
Warning: simplexml_load_file() [function.simplexml-load-file]: http://..../index.php/site/projects/:15: parser error : xmlParseEntityRef: no name in /home/bluecard1/public_html/test.php on line 3

Warning: simplexml_load_file() [function.simplexml-load-file]: ional Relief & Development" project_id="313" client_name="International Relief & in /home/bluecard1/public_html/test.php on line 3    
Warning: simplexml_load_file() [function.simplexml-load-file]: ^ in /home/bluecard1/public_html/test.php on line 3    
Warning: simplexml_load_file() [function.simplexml-load-file]: http://..../index.php/site/projects/:15: parser error : xmlParseEntityRef: no name in /home/bluecard1/public_html/test.php on line 3

How do I rectify to remove these warnings?

(XML is generated from url http://..../index.php/site/projects & loaded into a variable in the test.php. I dont have write priveleges to index.php)

hakre
  • 193,403
  • 52
  • 435
  • 836
Rajat Gupta
  • 25,853
  • 63
  • 179
  • 294
  • The XML is invalid. You might not be able to load it at all. Errors can be suppressed by adding `@` in front of `simplexml_load_file` or by adding a flag, see the manual page of `simplexml_load_file` for more information and please delete your question, it's a duplicate. – hakre Sep 29 '11 at 23:54
  • I can see that my answer is getting quite a lot of attention, if that's actually the solution: can you please flag it as "correct answer"? thanks. – ricricucit May 13 '14 at 12:12

9 Answers9

176

The XML is most probably invalid. The problem could be the "&"

$text = preg_replace('/&(?!#?[a-z0-9]+;)/', '&amp;', $text);

will get rid of the "&" and replace it with HTML code version... give it a try.

ricricucit
  • 2,276
  • 2
  • 15
  • 19
  • 2
    The best practice while working with XML is to ensure there are no conflicting characters and you should replace them before parsin – Mr Megamind Jul 05 '17 at 09:56
  • 2
    thanks, main point of this question is because xml is invalid – yussan Oct 01 '17 at 14:48
  • 2
    global search is implicit in preg_replace. If you add 'g' at the end you got a warning : Warning: preg_replace(): Unknown modifier 'g' – ReaperSoon Jul 20 '21 at 20:40
  • Don't add 'g' at the end. It cause warning - preg_replace(): Unknown modifier 'g'. Please edit the original answer back. – T.O.M. Feb 01 '22 at 11:22
  • You can now use `\PhpOffice\PhpWord\Settings::setOutputEscapingEnabled(true);`. See https://phpword.readthedocs.io/en/latest/general.html#output-escaping – Lionel Ding Feb 03 '22 at 12:27
98

Found this here ...

Problem: An XML parser returns the error “xmlParseEntityRef: noname”

Cause: There is a stray ‘&’ (ampersand character) somewhere in the XML text eg. some text & some more text

Solution:

  • Solution 1: Remove the ampersand.
  • Solution 2: Encode the ampersand (that is replace the & character with &amp; ). Remember to Decode when reading the XML text.
  • Solution 3: Use CDATA sections (text inside a CDATA section will be ignored by the parser.) eg. <![CDATA[some text & some more text]]>

Note: ‘&’ ‘<' '>‘ will all give problems if not handled correctly.

FluffyKitten
  • 13,824
  • 10
  • 39
  • 52
King'ori Maina
  • 4,440
  • 3
  • 26
  • 38
  • Do we know why this is? Also, will a CDATA section still be picked up by a browser that would render some of this data? I have some HTML tags inside my XML tags and I need them to be rendered to the end user for an editing tool. – sulimmesh Feb 29 '16 at 19:16
13

Try to clean the HTML first using this function:

$html = htmlspecialchars($html);

Special chars are usually represented differently in HTML and it might be confusing for the compiler. Like & becomes &amp;.

Aminah Nuraini
  • 18,120
  • 8
  • 90
  • 108
Ufuk Özdemir
  • 155
  • 1
  • 3
  • Can someone explain why this is downvoted? `htmlspecialchars()` is the precise function to convert `&, ", <, >` chars in the element data. – JacobRossDev Oct 26 '16 at 18:19
  • 7
    This answer is downvoted cause it doesn't work well in this case. Using that function will totally break your XML by converting "<" to "<". I'm unaware of any way that you can use `htmlspecialchars()` and not break XML. I tried a few flags and my XML still broke. – Alex Finnarn Oct 04 '17 at 22:14
  • 5
    You should use `htmlspecialchars` on the content of an xml tag, not on the whole XML – gbalduzzi Jul 23 '19 at 07:50
  • This answer helps me a lot. And yes, as @gbalduzzi said: Only use it to the content. – Martin Oct 05 '21 at 15:32
9

I use a combined version :

strip_tags(preg_replace("/&(?!#?[a-z0-9]+;)/", "&amp;",$textorhtml))
Gama11
  • 31,714
  • 9
  • 78
  • 100
Reign.85
  • 2,420
  • 1
  • 28
  • 28
9

PROBLEM

  • PHP function simplexml_load_file is throwing parsing error parser error : xmlParseEntityRef while trying to load the XML file from a URL.

CAUSE

  • XML returned by the URL is not a valid XML. It contains & value instead of &amp;. It is quite possible that there are other errors which aren't obvious at this point of time.

THINGS OUT OF OUR CONTROL

  • Ideally, we should make sure that a valid XML is feed into PHP simplexml_load_file function, but it looks like we don't have any control over how the XML is created.
  • It is also not possible to force simplexml_load_file to process an invalid XML file. It does not leave us with many options, other than fixing the XML file itself.

POSSIBLE SOLUTION

Convert Invalid XML to Valid XML. It can be done using PHP tidy extension. Further instructions can be found from http://php.net/manual/en/book.tidy.php

Once you are sure that the extension exists or is installed, please do the following.

/**
 * As per the question asked, the URL is loaded into a variable first, 
 * which we can assume to be $xml
 */
$xml = <<<XML
<?xml version="1.0" encoding="UTF-8"?>
<project orderno="6" campaign_name="International Relief & Development for under developed nations">
    <invalid-data>Some other data containing & in it</invalid-data>
    <unclosed-tag>
</project>
XML;

/**
 * Whenever we use tidy it is best to pass some configuration options 
 * similar to $tidyConfig. In this particular case we are making sure that
 * tidy understands that our input and output is XML.
 */
$tidyConfig = array (
    'indent' => true,
    'input-xml' => true, 
    'output-xml' => true,
    'wrap' => 200
);

/**
 * Now we can use tidy to parse the string and then repair it.
 */
$tidy = new tidy;
$tidy->parseString($xml, $tidyConfig, 'utf8');
$tidy->cleanRepair();

/**
 * If we try to output the repaired XML string by echoing $tidy it should look like. 

 <?xml version="1.0" encoding="utf-8"?>
 <project orderno="6" campaign_name="International Relief &amp; Development for under developed nations">
      <invalid-data>Some other data containing &amp; in it</invalid-data>
      <unclosed-tag></unclosed-tag>
 </project> 

 * As you can see that & is now fixed in campaign_name attribute 
 * and also with-in invalid-data element. You can also see that the   
 * <unclosed-tag> which didn't had a close tag, has been fixed too.
 */
echo $tidy;

/**
 * Now when we try to use simplexml_load_string to load the clean XML. When we
 * try to print_r it should look something like below.

 SimpleXMLElement Object
(
    [@attributes] => Array
        (
            [orderno] => 6
            [campaign_name] => International Relief & Development for under developed nations
        )

    [invalid-data] => Some other data containing & in it
    [unclosed-tag] => SimpleXMLElement Object
        (
        )

)

 */
 $simpleXmlElement = simplexml_load_string($tidy);
 print_r($simpleXmlElement);

CAUTION

The developer should try to compare the invalid XML with a valid XML (generated by tidy), to see there are no adverse side effects after using tidy. Tidy does an extremely good job of doing it correctly, but it never hurts to see it visually and to be 100% sure. In our case it should be as simple as comparing $xml with $tidy.

Kamal Soni
  • 1,522
  • 13
  • 15
7

The XML is invalid.

<![CDATA[ 
{INVALID XML}
]]> 

CDATA should be wrapped around all special XML characters, as per W3C

Edwin Daniels
  • 346
  • 2
  • 11
3

This is in deed due to characters messing around with the data. Using htmlentities($yourText) worked for me (I had html code inside the xml document). See http://uk3.php.net/htmlentities.

Guillaume
  • 477
  • 5
  • 6
2

This solve my problème:

$description = strip_tags($value['Description']);
$description=preg_replace('/&(?!#?[a-z0-9]+;)/', '&amp;', $description);
$description= preg_replace("/(^[\r\n]*|[\r\n]+)[\s\t]*[\r\n]+/", "\n", $description);
$description=str_replace(' & ', ' &amp; ', html_entity_decode((htmlspecialchars_decode($description))));
Malki Mohamed
  • 1,578
  • 2
  • 23
  • 40
1

If you are getting this issue with opencart try editing

catalog/controller/extension/feed/google_sitemap.php For More info and How to do it refer this: xmlparseentityref-no-name-error