Android - Generic XML Parser using SAXParser

Question

I need to write an XML parsing class that I can reuse throughout my Android application and from what I've read a SAXParser is the best for a mobile application. I am using this guide:

http://www.jondev.net/articles/Android_XML_SAX_Parser_Example

And the type of document I wish to parse is a feed from the Blogger GData API - example would be:

<?xml version='1.0' encoding='UTF-8'?>
<?xml-stylesheet href="http://www.blogger.com/styles/atom.css" type="text/css"?>
<feed xmlns='http://www.w3.org/2005/Atom' xmlns:openSearch='http://a9.com/-/spec/opensearch/1.1/' xmlns:gd='http://schemas.google.com/g/2005' gd:etag='W/&quot;CUIGRnc4fyp7ImA9Wx9SEEg.&quot;'>
    <id>tag:blogger.com,1999:user-464300745974.blogs</id>
    <updated>2010-11-29T17:58:47.937Z</updated>
    <title>Tim's Blogs</title>
    <link rel='http://schemas.google.com/g/2005#feed' type='application/atom+xml' href='http://www.blogger.com/feeds/blogid/blogs'/>
    <link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/blogid/blogs'/>
    <link rel='alternate' type='text/html' href='http://www.blogger.com/profile/blogid'/>
    <author>
        <name>Tim</name>
        <uri>http://www.blogger.com/profile/blogid</uri>
        <email>noreply@blogger.com</email>
    </author>
    <generator version='7.00' uri='http://www.blogger.com'>Blogger</generator>
    <openSearch:totalResults>2</openSearch:totalResults>
    <openSearch:startIndex>1</openSearch:startIndex>
    <openSearch:itemsPerPage>25</openSearch:itemsPerPage>
    <entry gd:etag='W/&quot;DUIBQHg-cCp7ImA9Wx9TF0s.&quot;'>
        <id>tag:blogger.com,1999:user-464300745974.blog-blogid</id>
        <published>2010-06-22T10:59:38.603-07:00</published>
        <updated>2010-11-26T02:32:31.658-08:00</updated>
        <title>Application Testing Blog</title>
        <summary type='html'>This blog is for testing the Android application.</summary>
        <link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/blogid/blogs/blogid'/>
        <link rel='alternate' type='text/html' href='http://devrum.blogspot.com/'/>
        <link rel='http://schemas.google.com/g/2005#feed' type='application/atom+xml' href='http://devrum.blogspot.com/feeds/posts/default'/>
        <link rel='http://schemas.google.com/g/2005#post' type='application/atom+xml' href='http://www.blogger.com/feeds/blogid/posts/default'/>
        <link rel='http://schemas.google.com/blogger/2008#template' type='application/atom+xml' href='http://www.blogger.com/feeds/blogid/template'/>
        <link rel='http://schemas.google.com/blogger/2008#settings' type='application/atom+xml' href='http://www.blogger.com/feeds/blogid/settings'/>
        <author>
            <name>Tim</name>
            <uri>http://www.blogger.com/profile/blogid</uri>
            <email>noreply@blogger.com</email>
        </author>
    </entry>
    <entry gd:etag='W/&quot;C08HRXo4eSp7ImA9Wx9TE0o.&quot;'>
        <id>tag:blogger.com,1999:user-464300745974.blog-515600026106499737</id>
        <published>2010-06-22T10:59:00.328-07:00</published>
        <updated>2010-11-21T12:37:14.431-08:00</updated>
        <title>Development Blog</title>
        <summary type='html'>etc</summary>
        <link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/blogid/blogs/515600026106499737'/>
        <link rel='alternate' type='text/html' href='http://rumdev.blogspot.com/'/>
        <link rel='http://schemas.google.com/g/2005#feed' type='application/atom+xml' href='http://rumdev.blogspot.com/feeds/posts/default'/>
        <link rel='http://schemas.google.com/g/2005#post' type='application/atom+xml' href='http://www.blogger.com/feeds/515600026106499737/posts/default'/>
        <link rel='http://schemas.google.com/blogger/2008#template' type='application/atom+xml' href='http://www.blogger.com/feeds/515600026106499737/template'/>
        <link rel='http://schemas.google.com/blogger/2008#settings' type='application/atom+xml' href='http://www.blogger.com/feeds/515600026106499737/settings'/>
        <author>
            <name>Tim</name>
            <uri>http://www.blogger.com/etc</uri>
            <email>noreply@blogger.com</email>
        </author>
    </entry>
</feed>

I need to parse the blog IDs and post IDs out of feeds like the above. From any example I find on SAX, they are not generic at all. I'd like to write a reusable one, do you have any examples how I can modify the SAXParser accordingly?

Peter Knego · Accepted Answer · 2011-02-28T15:55:16.220

SAX parsers are event driven parsers. You write a handler for every TYPE of XML node (start element, end element, attribute, text, etc..) and then parser iterates XML document and SAX events are sent to you (= methodes in your handler get called).

In your particular case you are looking for two node sequences:

<feed><id>
<feed><entry><id>

So you have to store some state info inside your handler to know where you are. Here is the code (didn't try it myself, you'll have to debug it):

public class DataHandler extends DefaultHandler {

    private Feed feed;
    private Entry currentEntry;
    private boolean isId = false;

    public Feed getFeed() {
        return feed;
    }

    @Override
    public void startElement(String namespaceURI, String localName, String qName, Attributes atts) throws SAXException {
        if (localName.equals("feed")) {
            feed = new Feed();
        } else if (localName.equals("entry")) {
            currentEntry = new Entry();
        } else if (localName.equals("id")) {
            isId = true;
        }
    }

    @Override
    public void endElement(String namespaceURI, String localName, String qName) throws SAXException {
        if (localName.equals("feed")) {
            // </feed> - do nothing, it's the end
        } else if (localName.equals("entry")) {
            // </entry> - save current entry and reset variable
            feed.entries.add(currentEntry);
            currentEntry = null;
        } else if (localName.equals("id")) {
             isId = false;
        }
    }

    @Override
    public void characters(char ch[], int start, int length) {
        if(!isId) return;

        String chars = new String(ch, start, length);
        chars = chars.trim();

        if (currentEntry != null) {
            currentEntry.id = chars;
        } else {
            feed.id = chars;
        }
    }

    private class Feed {
        public String id;
        public List<Entry> entries = new ArrayList<Entry>();
    }

    private class Entry {
        public String id;
    }
}

score 1 · Answer 2 · answered Feb 28 '11 at 15:42

Try something along the lines of this:

public class XmlParser extends DefaultHandler{
    private static final int        STATE_FEED      = 0;
    private static final int        STATE_ID        = 1;
    private static int          sDepth          = 0;
    private static int          sCurrentState   = 0;
    private String              mTheString;
public XmlParser(){}

@Override
public void startDocument() throws SAXException{}

@Override
public void endDocument() throws SAXException{}

@Override
public void startElement(String namespaceURI, String localName, String qName, Attributes atts) throws SAXException{
    mTheString = "";
    sDepth++;
    if (qName.equals("feed")){
        sCurrentState = STATE_FEED;
        return;
    }
    if (qName.equals("id")){
                    sCurrentState = STATE_ID;
            return;
    }
    sCurrentState = 0;
}

@Override
public void endElement(String namespaceURI, String localName, String qName) throws SAXException{
    sDepth--;
        switch (sCurrentState){
            case STATE_FEED:
                                   //Do something with the feed
                sCurrentState = 0;
                mTheString = "";
                break;
            case STATE_ID:
                // Save the ID or do whatever
                sCurrentState = 0;
                mTheString = "";
                break;
            default:
                //Do nothing
                mTheString = "";
                return;
        }
        mTheString = "";
}

@Override
public void characters(char ch[], int start, int length){
      mTheString = mTheString + new String(ch, start, length);
}

}

You can access the custom SAXParser with something like this:

InputStream stream = //whatever your stream is (the document)
XmlParser handler = new XmlParser(); // your custom parser
XMLReader xmlreader = XMLReaderFactory.createXMLReader();
xmlreader.setContentHandler(handler); 
xmlreader.parse(new InputSource(stream));
   // Then you can create a method in the handler, like getResults to return the list of elements or something here.

So you pass your custom parser into the Xml Reader, and get the results from the source. During the Xml Parsing, the handler starts at "start document" then iterates through the elements in the xml (calling startElement at the start, endElement at the beginning). The characters method is called in between these two - picking up the characters (which you can then do whatever you want with in the endElement). The parser is finished when endDocument is called, so you can set things up and tear them down at the start and end of elements or the whole document if you wish.

Hope this helps, and is close to what you are looking to do.

Thanks for the response. I'm looking at your suggestion here and the one above. I have combined the two, what I am unsure is how to return the results. As the parser runs through am I supposed to be storing the values in variables and then after the .parse I then call that method? Sorry I am finding this SAXParser a hard concept to grasp. — Tim, Mar 01 '11 at 18:48
In my example above, the mTheString variable holds the data for the element. When you get to a new element, it runs through the characters() method a few times (depending on how much data is there) and so you must add the data to the string to build the entire value. Once the whole value is read in, endElement is called. So for example, if you want to add every id to a list, you would call "myList.add(mTheString);" in endElement(), and so when the whole document is parsed, myList will contain a list of all the id's. You can then call a method like getMyList(). Hope this helps — biddulph.r, Mar 03 '11 at 15:28

Android - Generic XML Parser using SAXParser

2 Answers2