2

Possible Duplicates:
PHP what is the best approach to using XML? Need to create and parse XML responses
Parse big XML in PHP

Hello Community,

I am writing an application, that requires to parse XML files, that can minimum of 1000 MBs or more in size.

I have tried with few code that is available on internet. As file size is more, it's easy to understand that file will have lots and lots of XML tags. So, loop performance gets weak as time elapse.

So, I would need a parser: -> Performance is considerably good as time passes, when doing execution / parsing -> Doesn't load the whole XML file in memory

I know about following XML parsers, but not sure which to use and why?

  1. XML Parser
  2. SimpleXML
  3. XMLReader

I am using PHP 5.3, so please help me guys and gals, to choose the parser.

You can even suggest me some other options, or classes.

Thanks.

EDIT

I even want to know about SAX (Simple API for XML) and StAX implementation of PHP

Community
  • 1
  • 1
I-M-JM
  • 15,732
  • 26
  • 77
  • 103
  • 1
    100MB or 1000MB? If it's the former, you could load it in memory and make it a lot faster. –  Apr 12 '11 at 08:41
  • more http://stackoverflow.com/search?q=parse+large+xml+php – Gordon Apr 12 '11 at 09:00
  • An awful lot depends on what you want to do with the data once it has been parsed. You don't say. – Michael Kay Apr 12 '11 at 09:10
  • @I-M-JM how about editing your question to ask something that wasnt already asked before. The difference between the various parsers is easily answered in the linked questions above. Dont be lazy and expect people to reiterate/summarize them for you. If you want answers, point out why none of those above answer your question. – Gordon Apr 12 '11 at 09:45
  • @Gordon: I have edited, please look at question again. – I-M-JM Apr 12 '11 at 09:49
  • My previous comment took that edit into account already. – Gordon Apr 12 '11 at 09:52
  • @Gordon: I had a look at each and every particle, and links on them too. Didn't find anywhere info related to SAX or StAX – I-M-JM Apr 12 '11 at 10:20
  • then you didnt look hard enough, because the info is there. – Gordon Apr 12 '11 at 10:30
  • @Gordon: please help me out, where is it mention, do tell me. – I-M-JM Apr 12 '11 at 10:33
  • All of them either mention SAX based parsers directly or link to resources with information about them. And and all of them answer your question what parser to use for large XML files sufficiently. – Gordon Apr 12 '11 at 10:45
  • @Gordon: I'll have a look at all the links back – I-M-JM Apr 12 '11 at 10:49

2 Answers2

5

First of all, you can't load that much XML in memory. It depends on your machine, but if your XML file is more than 10-20 MB it generally is too much. The server may be able to handle more, but it's not a good idea to fill all the memory with one script. So you can rule out SimpleXML and DOM from the start.

The other two options, XML Parser and XMLReader, will both be good, with XMLReader being a newer extension, so probably better. But as a warning you should take notice that XMLReader also allows you to load everything in memory. Don't do that. Instead use it as a node-by-node parser and read/process your data in small bits.

You problem may go beyond the scope of choosing a parser if you need most of the data from the XML. You should also make sure that you don't load it all up in memory and use it at the end of the script. Instead use it as you get it and dispose of it once you no longer need it.

Alin Purcaru
  • 43,655
  • 12
  • 77
  • 90
0

Load your giant XML files into an XML database and perform your query and manipulations through their XQuery/XSLT interfaces.

http://www.xml.com/pub/a/2003/10/22/embed.html

Mads Hansen
  • 63,927
  • 12
  • 112
  • 147