34

I'm trying to figure out how to parse some XML (for an Android app), and it seems pretty ridiculous how difficult it is to do in Java. It seems like it requires creating an XML handler which has various callbacks (startElement, endElement, and so on), and you have to then take care of changing all this data into objects. Something like this tutorial.

All I really need is to change an XML document into a multidimensional array, and even better would be to have some sort of Hpricot processor. Is there any way to do this, or do I really have to write all the extra code in the example above?

Kyle Slattery
  • 33,318
  • 9
  • 32
  • 36
  • If you are only interested in parsing (small) XML configuration files, I would suggest you take a look at [XPath](http://www.ibm.com/developerworks/library/x-javaxpathapi/index.html). I usually work with that as it allows very easy access. The performance gets worse if you are working with large XML files though. – brimborium Jun 08 '12 at 09:59

13 Answers13

26

There are two different types of processors for XML in Java (3 actually, but one is weird). What you have is a SAX parser and what you want is a DOM parser. Take a look at http://www.mkyong.com/java/how-to-read-xml-file-in-java-dom-parser/ for how to use the DOM parser. DOM will create a tree which you can navigate pretty easily. SAX is best for large documents but DOM is much easier if slower and much more memory intensive.

stimms
  • 42,945
  • 30
  • 96
  • 149
18

Try http://simple.sourceforge.net, its an XML to Java serialization and binding framework, its fully compatible with Android and is very lightweight, 270K and no dependencies.

ng.
  • 7,099
  • 1
  • 38
  • 42
  • This should be getting more upvotes people, this is truly the best way to work with XML on Android. Use it. If you don't know how to include it in an Android project then look at this blog post: http://massaioli.homelinux.com/wordpress/2011/04/21/simple-xml-in-android-1-5-and-up/ – Robert Massaioli May 05 '11 at 02:23
  • This is pretty much what Gson is to Json in java !! :D Fantastic library! – Skela Aug 05 '12 at 12:54
  • Are you sure it has no dependencies? When I added simple from maven, xpp, stax and stax-api showed up along http://i.imgur.com/T3h7Pb1.png – Bala R Mar 20 '13 at 12:38
4

Kyle,

(Please excuse the self-promotey nature of this post... I've been working on this library for months and it's all open source/Apache 2, so not that self-serving, just trying to help).

I just released a library I'm calling SJXP or "Simple Java XML Parser" http://www.thebuzzmedia.com/software/simple-java-xml-parser-sjxp/

It is a very small/tight (4 classes) abstraction layer that sits on top of any spec-compliant XML Pull Parser.

On Android and non-Android Java platforms, pull parsing is probably one of the most performant (both in speed and low memory overhead) methods of parsing. Unfortunately coding directly against a pull-parser ends up looking a lot like any other XML parsing code (e.g. SAX) -- you have exception handlers, maintaining parser state, error checking, event handling, value parsing, etc.

What SJXP does is allows you to define XPath-like "paths" in a document of the elements or attributes you want the values from, like:

/rss/channel/title

and it will invoke your callback, with the value, when that rule matches. The API is really straight forward and has intuitive support for namespace-qualified elements if that is what you are trying to parse.

The code for a standard parser would look something like this (an example that parses an RSS2 feed title):

IRule titleRule = new DefaultRule(Type.CHARACTER, "/rss/channel/title") {
@Override
public void handleParsedCharacters(XMLParser parser, String text) {
    // Store the title in a DB or something fancy
}}

then you just create an XMLParser instance and give it all the rules you want it to care about:

XMLParser parser = new XMLParser(titleRule);
parser.parse(xmlStream);

And that's it, the parser will invoke the handler method every time the rule matches. You can stop parsing at any time by calling parser.stop() if you want.

Additionally (and this is the real win of this library) matching namespace qualified elements and attributes is dead easy, you just add their namespace URI inside of brackets prefixing the name of the element in your path.

An example, say you want out of the element for an RSS feed so you can tell what language it is in (ref: http://web.resource.org/rss/1.0/modules/dc/). You just use the unique namespace URI for that 'language' element with the 'dc' prefix, and the rule path ends up looking like this:

/rss/channel/[http://purl.org/dc/elements/1.1/]language

The same goes for namespace-qualified attributes as well.

With all that ease, the only overhead you add to the parsing process is an O(1) hash lookup at each location of the XML document and a few-hundred bytes, maybe 1k, for the internal location state of the parser.

The library works on Android with no additional dependencies (because the platform provides an org.xmlpull impl already) and in any other Java runtime by adding the XPP3 dependency.

This library is the result of many months of writing custom pull parsers for every kind of feed XML out there in every language and realizing (over time) that about 90% of parsing can be distilled down into this really basic paradigm.

I hope you find it handy.

Riyad Kalla
  • 10,604
  • 7
  • 53
  • 56
2

Starting w/ Java 5, there is an XPath library in the SDK. See this tutorial for an introduction to it.

Hank Gay
  • 70,339
  • 36
  • 160
  • 222
2

Acording to me, you should use SAX parser because: - Fast - you can control everything in XML document

You will pay more time to coding, but it's once because you will create code template to parse XML

From second case, you only edit content of changes.

Good luck!

misamap
  • 63
  • 7
1

In my opinion, using XPath for parsing XML may be your easiest coding approach. You can embody the logic for pulling out nodes from an XML document in a single expression, rather than having to write the code to traverse the document's object graph.

I note that another posted answer to this question already suggested using XPath. But not yet for your Android project. As of right now, the XPath parsing class is not yet supported in any Android release (even though the javax.xml namespace is defined in the Dalvik JVM, which could fool you, as it did me at first).

Inclusion of XPath class in Android is a current work item in late phase. (It is being tested and debugged by Google as I write this). You can track the status of adding XPath to Davlik here: http://code.google.com/p/android/issues/detail?id=515

(It's an annoyance that you cannot assume things supported in most Java VMs are included yet in the Android Dalvik VM.)

Another option, while waiting for official Google support, is JDOM, which presently claims Dalvik VM compatibility and also XPath support (in beta). (I have not checked this out; I'm just repeating current claims from their web site.)

M.Bearden
  • 435
  • 5
  • 13
1

You can try this
http://xml.jcabi.com/
It is is an extra layer on top of DOM that allows simple parsing, printing, and transforming of XML documents and nodes

George
  • 7,206
  • 8
  • 33
  • 42
  • Ba careful. It has lots of dependency and my spring boot app is failed to startup because it is detected something in classpath (this lib was the only addition) – takacsot Mar 17 '17 at 13:45
1

I've created a really simple API to solve precisely this problem. It's just a single class that you can include in your code base and it's really clean and easy to parse any XML. You can find it here:

http://argonrain.wordpress.com/2009/10/27/000/

Chris
  • 11
  • 1
0

There is a very good example shows for XmlPullParser for any type of xml. It could also parse as a generic way, you do not need to change any thing for that just get that class and put into your android project.

Generic XmlPullParser

Samdrain
  • 441
  • 4
  • 13
0

You could also use Castor to map the XML to Java beans. I have used it before and it works like a charm.

Rahul
  • 12,886
  • 13
  • 57
  • 62
0

Writing SAX handler is the best way to go. And once you do that you will never go back to anything else. It's fast, simple and it crunches away as it goes, no sucking large parts or god forbid a whole DOM into memory.

Bostone
  • 36,858
  • 39
  • 167
  • 227
0

A couple of weeks ago I battered out a small library (a wrapper around javax.xml.stream.XMLEventReader) allowing one to parse XML in a similar fashion to a hand-written recursive descent parser. The source is available on github, and a simple usage example is below. Unfortunately Android doesn't support this API but it is very similar to the XmlPullParser API, which is supported, and porting wouldn't be too time-consuming.

accept("tilesets");
    while (atTag("tileset")) {
        String filename = attrib("file");
        File tilesetFile = new File(filename);
        if (!tilesetFile.isAbsolute()) {
            tilesetFile = new File(FilenameUtils.concat(file.getParent(), filename));
        }
        int tilesize = Integer.valueOf(attrib("tilesize"));
        Tileset t = new Tileset(tilesetFile, tilesize);
        t.setID(attrib("id"));
        tilesets.add(t);

        accept();
        close();
    }
close();

expect("map");

int width       = Integer.valueOf(attrib("width"));
int height      = Integer.valueOf(attrib("height"));
int tilesize    = Integer.valueOf(attrib("tilesize"));
jaz303
  • 1,136
  • 9
  • 11
-2

Well parsing XML is not an easy task.

Its basic structure is a tree with any node in tree capable of holding a container which consists of an array of more trees.

Each node in a tree contains a tag and a value but in addtion can contain an arbitary number of named attributes, and, an arbitary number of children or containers.

XML parsing tasks tend to fall in to three catagories.

Things that can be done with "regex". E.g. you want to find the value of the first "MailTo" tag and are not interested in the contents of any other tags.

Things you can parse yourself. The xml structure is always very simple e.g a root node and ten well known tags with simple values.

All the rest! Even though an xml message format can look deceptively simple home made parsers are easily confused by extra attributes, CDATA and unexpected children. Full blown XML parsers can handle all of these situations. Here the basic choice is between a stream or a DOM parser. If you intend to use most of the entities/attributes given in the order you want to use them then a DOM parser is ideal. If you are only interested in a few attributes and intend to use them in the order they are presented, if you have performance constraints, or, if the xml files are large ( > 500MB ) than a stream parser is the way to go; the callback mechanism takes a bit of "groking" but its actually quite simple to program once you get the hang of it.

James Anderson
  • 27,109
  • 7
  • 50
  • 78
  • 2
    Are you seriously suggesting that one should use regexps or a home-grown XML parser for "simple" cases? -1 – gustafc Nov 12 '09 at 07:40
  • Would not really recommend it except where performance was big factor. For instance if you were load balancing based on customer number, it might make sense just to scan for the first CustNo tag rather than firing up the full monster XML parser. – James Anderson Nov 12 '09 at 10:15
  • 1
    James, using a regex engine to match Strings against expressions is a lot more expensive than a lexing based approach like XML parsing; especially with a fast pull parser or SAX parser. I don't post this to "snub" you, just letting you know in case you actually are rolling out the regexp approach to a massive scalable app, you might want to change that. – Riyad Kalla Feb 23 '11 at 20:06
  • 2
    Like I say I wouldnt really recommend this approach. Perhaps I should have highlighted the disadvantages more in the post! – James Anderson Mar 21 '11 at 02:29