0

I am going to build a script and I will handle a really big XML file. My question is when you have to move to a pull parser instead of using DOM or SimpleXML?

When the xml file is loaded into memory, it will be loaded every time for each user that requests data from it, even simultaneously or a single time for a period ?

Should I follow the "chunk method" or to avoid it and use SAX/XMLreader ?

Thank you.

EnexoOnoma
  • 8,454
  • 18
  • 94
  • 179
  • when you notice it negatively impacts your application – Gordon Jul 21 '11 at 10:49
  • But is it worth it to code your app using SimpleXML and one month later to re-code it in XMLreader? – EnexoOnoma Jul 21 '11 at 10:57
  • Same answer as before :) But in general, when you have huge files and/or environments where speed and memory consumption matters, is it is better to use a pull parser like XmlReader oder even the event based XML Parser. See http://stackoverflow.com/questions/3577641/best-methods-to-parse-html/3577662#3577662 – Gordon Jul 21 '11 at 11:05
  • 1
    sorry, linked the wrong one above. Check this one [Best XML Parser for PHP](http://stackoverflow.com/questions/188414/best-xml-parser-for-php/3616044#3616044) – Gordon Jul 21 '11 at 11:26

1 Answers1

2

Tree-based representations of XML, such as DOM, generally use 5-to-10 times the source document size. So it becomes a pain above 200Mb or so (or less if you're trying to handle a high concurrency and throughput, of course).

If you want to avoid repeatedly loading the same document into memory, your application should maintain a cache.

Michael Kay
  • 156,231
  • 11
  • 92
  • 164