I'm currently having a problem importing a large XML file and I can't work out why. We get an XML output from a partner that is around 443MB in size. The error that I get is as follows:
PHP Warning: SimpleXMLElement::__construct(): Entity: line 1: parser error : internal error in /home/imports/catalog.php on line 54
Warning: SimpleXMLElement::__construct(): Entity: line 1: parser error : internal error in /home/imports/catalog.php on line 54
PHP Warning: SimpleXMLElement::__construct(): ch to marriage, parenting, entrepreneurship, etc will be significantly upgraded. in /home/imports/catalog.php on line 54
Warning: SimpleXMLElement::__construct(): ch to marriage, parenting, entrepreneurship, etc will be significantly upgraded. in /home/imports/catalog.php on line 54
PHP Warning: SimpleXMLElement::__construct():
^ in /home/imports/catalog.php on line 54
Warning: SimpleXMLElement::__construct():
^ in /home/imports/catalog.php on line 54
PHP Fatal error: Uncaught exception 'Exception' with message 'String could not be parsed as XML' in /home/imports/catalog.php:54
Stack trace:
#0 /home/imports/catalog.php(54): SimpleXMLElement->__construct('<?xml version="...')
#1 {main}
thrown in /home/imports/catalog.php on line 54
Fatal error: Uncaught exception 'Exception' with message 'String could not be parsed as XML' in /home/imports/catalog.php:54
Stack trace:
#0 /home/imports/catalog.php(54): SimpleXMLElement->__construct('<?xml version="...')
#1 {main}
thrown in /home/imports/catalog.php on line 54
Line 54 of the code is simply:
$xml = new SimpleXMLElement(file_get_contents($_CFG_XML_URL));
As far as I can tell, the error appears to be in the element containing ch to marriage, parenting, entrepreneurship, etc will be significantly upgraded.
. Unfortunately this is a long way in to the file and due to its size it's difficult to read the contents. My large-file reader reads in a line at a time and this XML is all on one line so it's too much for it to handle gracefully, even on a workstation with 32GB RAM and a 64-bit editor.
I've tried redownloading the file a few times but the problem is always the same. I've doubled the available memory for the script and it still fails in the same place.
So, I got on to the partner and asked for the XML for this particular item and they provided the following:
<EBook EAN="9792219192201">
<Title>Success-a-Phobia</Title>
<SubTitle>Discovering And Conquering Mankinds Most Persuasive, but Unknown, Phobia</SubTitle>
<Publisher>The Benjamin Consulting Group, LLC</Publisher>
<PublicationDate>29/09/2012</PublicationDate>
<Contributors>
<Contributor Code="A01" Text="By (author)">Benjamin, Marcus D.</Contributor>
</Contributors>
<Formats>
<Format Type="6"/>
</Formats>
<ShortDescription>People today still desire to be successful in matters of family, finance or business even though we are in the midst of major social, political and economic challenges. Have you every been at that moment where you wanted to do something significant, yet you were paralyzed from making the necessary choices to realize your dream? Have you experienced failure and are now sitting in the stands, paralyzed from getting back in the &quote;game of life?&quote; Are you at the verge of a major decision that could affect your life for many years? If you are in this category, this is your book of the year! With humor, real-life antidotes, real-life examples and solid narration, Marcus Benjamin will guide you toward discovering the most pervasive, yet unknown, phobia in the history of mankind. Once this phobia is discovered, the second half of the book shows you how to rid yourself of this phobia for good. Not only will this book impact your life, but your approach to marriage, parenting, entrepreneurship, etc will be significantly upgraded.</ShortDescription>
</EBook>
Nothing about that XML rings any alarm bells to me, but clearly partway through PHP is having a problem. It appears to be 978 characters in to the element content but that doesn't ring any particular alarm bells for me.
The PHP script is running from a command line in an Amazon EC2 instance. The OS is the Amazon Linux (RHEL)
So, basically, I'm stuck. Has anyone any ideas what could be causing this problem?