34

I have tried many of the Perl XML Parsers. I was quite interested in the Sablotron Parser, but it is such a pain to install on a Windows box. Currently I have started using XML::LibXML and XML::LibXSLT both of which seem to do everything I need.

They seem to be quite standard as well. Are there any better XML Parsers to use than this?

brian d foy
  • 129,424
  • 31
  • 207
  • 592
Xetius
  • 44,755
  • 24
  • 88
  • 123
  • 2
    The "best" XML parser is the one that meets your needs. You did not mention the type of XML parsing that you need: linear (SAX), tree (DOM), iterative (pull parser), etc so offering suggestions will be difficult. – Mr. Muskrat Jan 28 '09 at 16:52

8 Answers8

25

I think you are using a pretty good one. XML::LibXML, Matt Sergeant and Christian Glahn's Perl interface to Daniel Velliard's libxml2 is one of the faster XML Parsers that I know of.

mmcdole
  • 91,488
  • 60
  • 186
  • 222
12

It really depends on your needs, as people have said. To parse XML files that were ~100Mb in size (gene annotations from TAIR, 1 file per chromosome), I used mirod's XML::Twig module, which lets you set callbacks to parse the elements that interest you, presenting each sub-document as an XML::Simple tree. It combines the benefits of a SAX parser (scanning the file as a stream) with a DOM parser (working more easily with the interesting pieces).

Benjamin W.
  • 46,058
  • 19
  • 106
  • 116
Dotan Dimet
  • 490
  • 3
  • 5
10

If you need speed, power or features, XML::LibXML is the way to go. If you're after ease of use, though, XML::Simple is a viable alternative.

Joe Casadonte
  • 15,888
  • 11
  • 45
  • 57
  • Yes. Beware though: just because it's called Simple doesn't mean you're not supposed to read the documentation. – innaM Jan 28 '09 at 13:51
  • Indeed. XML::Simple is probably one of the most featureful 'simple' parsing tools I've used in a long time. :) – Robert P Jan 28 '09 at 17:47
5

(Actually it's not an answer, but a comment - however, I cannot comment...)

XML::Simple has been mentioned here.
(I know it's few from few years ago, but this appeared up in Google today...)

However, it's site (http://metacpan.org/pod/XML::Simple) now says:

STATUS OF THIS MODULE

The use of this module in new code is discouraged. Other modules are available which provide more straightforward and consistent interfaces. In particular, XML::LibXML is highly recommended.

The major problems with this module are the large number of options and the arbitrary ways in which these options interact - often with unexpected results.

Patches with bug fixes and documentation fixes are welcome, but new features are unlikely to be added.

szabgab
  • 6,202
  • 11
  • 50
  • 64
Zvika
  • 1,542
  • 2
  • 17
  • 21
5

In my experience XML::Simple is best for quick and dirty parsing of XML. We use it for parsing data from third parties that do not always conform to the XML standard. XML::Simple throws informative errors and gets you up an running extremely quickly.

aekeus
  • 296
  • 1
  • 2
2

You could also look at XML::Liberal which uses LibXML underneath.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
singingfish
  • 3,136
  • 22
  • 25
1

I think you should give XML::MyXML a try, too. It's very easy to use.

alexk
  • 1,488
  • 1
  • 12
  • 17
0

I'll offer one that SHOULD NOT be used: XML::Parser.

It automatically expands HTML entities to their UTF-8 equivalents, and the option to disable this behavior does not work on the most characteristic of all entities, &.

Additionally, its XMLDecl-parser will interpret and display the standalone attribute in the <?xml ... ?> block as "standalone"="1", which is absolutely incorrect -- it should be "standalone"="yes".

HoldOffHunger
  • 18,769
  • 10
  • 104
  • 133