63

I see that there is a few of XML processing libraries in Haskell.

  • HaXml seems to be the most popular (according to dons)
  • HXT seems to be the most advanced (but also the most difficult to learn thanks to arrows)
  • xml which seems to be just the basic parser
  • HXML seems to be abandoned
  • tagsoup and tagchup
  • libXML and libXML SAX bindings

So, which library to choose if I want it

  • to be reasonably powerful (to extract data from XML and to modify XML)
  • likely to be supported long time in the future
  • to be a “community choice” (default choice)

And while most of the above seem to be sufficient for my current needs, what are the reason to choose one of them over the others?

UPD 20091222:

Some notes about licenses:

Don Stewart
  • 137,316
  • 36
  • 365
  • 468
sastanin
  • 40,473
  • 13
  • 103
  • 130

3 Answers3

59

I would recommend:

  1. xml, if your task is simple
  2. haxml, if your task is complex
  3. hxt, if you like arrows
  4. hexpat if you need high performance
Don Stewart
  • 137,316
  • 36
  • 365
  • 468
  • 1
    Thank you, Don. That's the kind of suggestion I was looking for. – sastanin Sep 02 '09 at 07:25
  • 4
    "likely to be supported long time in the future" I would definitely use Haxml. It is 10 years old, and the authors are very active. – Don Stewart Sep 02 '09 at 19:50
  • Well, this is an important reason to choose HaXml. Thanks! – sastanin Sep 03 '09 at 10:20
  • 5
    I've really benefited from the tutorial at: http://www.haskell.org/haskellwiki/HXT/Practical. Unlike most of the other tutorials I found, this one started with a basic XML document, showed you how to parse it and then added complexities slowly. – Tim Stewart Apr 11 '11 at 01:34
  • 1
    Another good hxt tutorial explaining also the concept of arrows very well: http://adit.io/posts/2012-04-14-working_with_HTML_in_haskell.html – Stephan Kulla Mar 21 '14 at 12:51
  • 1
    Is this still true? I feel like I'm not smart enough to use HXT. – Carbon Aug 31 '17 at 13:50
14

HXT's main problem, aside from the unusual arrow syntax, is performance and memory usage. I have an app that spends 1.2 seconds processing about 1.5MB of XML, consuming about 2.3GB (!) of memory in the process. Libxml2 takes a few milliseconds on the same data. Extracting data via the css function and arrow predicates also seems very slow compared to Libxml2.

Alexander Staubo
  • 3,148
  • 2
  • 25
  • 22
  • Dunno if that's the problem here, but whether or not optimisation (-O2) is enabled can make a huge difference in some cases. – Julia Path Jul 27 '21 at 00:02
11

I would personally recommend HXT because it uses arrows, which are a very useful and powerful tool to learn, and an XML parsing library is the perfect use for arrows (they were first invented to solve various parsing problems that monads couldn't). Arrows are also starting to be used outside of pure functional programming, such as Arrowlets in JavaScript.

Will
  • 1,711
  • 1
  • 12
  • 17
  • 1
    Thanks, Will! That's why I started learning HXT, but I am also afraid that code written with HXT and arrows is less friendly for potential contributors. Also, it alarms me that HaXml is much more popular. – sastanin Sep 01 '09 at 12:45