0

I am having trouble handling memory efficiently in a PHP script I have written that parses XML and then imports it into a database.

There are a very large amount of XML files, roughly 4-5 million. The arrays of parsed data that I'm building inside the parser are becoming too large and the script just gets killed by the server.

Currently my script does the following:

1) Loads each XML file individually
2) Store relevant data in objects in an array
3) Go through all objects and build SQL query
4) Use PDO driver to start transaction and import data

The arrays become so big that the script just says 'Killed'. In this scenario, is the only option to buy a more powerful server or is there a way to optimize these massive arrays to make sure that the script still runs fast but doesn't crash?

jskidd3
  • 4,609
  • 15
  • 63
  • 127
  • Marked as a duplicate yet the other question has answers that don't answer my question... – jskidd3 Dec 22 '14 at 15:32
  • Is it necessary to hold the data of all XML files in memory at once, or could you split the job into a lot of smaller subtasks and run them after each other to save memory? (Eg, if instead of 1 job x 5 million XML files you run a million jobs x 5 files, your memory requirements should drop significantly) – Wilbo Baggins Dec 22 '14 at 15:32
  • @WilboBaggins That's a great idea, I hadn't thought of that. Could you describe what you mean with a real PHP example perhaps? Do you mean running a PHP file many times instead or running a PHP file that loads many XML files? – jskidd3 Dec 22 '14 at 15:33
  • Also @MarkB, the 4-5 million XML files are all very small, not large. The question you have marked as a duplicate is to do with handling very large XML files, not to do with handling lots of small ones (and no, they don't have similar answers) – jskidd3 Dec 22 '14 at 15:35
  • JSKidd3, they are in principle the same issue; you run out of memory if you force yourself to hold it all in memory at once. The solution is to 'stream' through the data, ie read/write parts at a time. So it is sort of a duplicate. The solution would be to re use variables, ie do something like foreach($xmlFiles as $xmlFile) { $data = file_get_contents($xmlFile); // then parse this file's data into database } – Wilbo Baggins Dec 22 '14 at 15:59
  • I would suggest you use XMLReader (http://php.net/manual/en/book.xmlreader.php).This does cope well with large XML files, although the syntax can be a little bit odd to work with. It is possible to use XMLReader to process the XML file, but then use SimpleXML to process each section in turn. For example say you had an XML file of 5 million products you use XMLReader to get to each product in turn, and then pass each individual products XML to SimpleXML (which should have no problem coping with one products details). – Kickstart Dec 22 '14 at 17:12
  • If lots of small files then possibly just load them in to a temp database table, and once loaded process from that. – Kickstart Dec 22 '14 at 17:25

0 Answers0