2

I am currently working on a project that creates a TCP socket and listens to the server for incoming xml. The xml are fairly large at times which will come around 1-3 mb. The xml keeps coming from the socket and I need to parse it as it comes. I tried out many parsers like DomParser, XMLPullParser and SaxParser. Sax seemed to be the fastest so I proceeded with that. But now I get OutOfMemory expeception sometimes.

I read in this post that we should data to the parser in chunks.

How to parse huge xml data from webservice in Android application?

Can some one tell me how that is done. My current code is like

InputSource xmlInputSource  =   new InputSource(new StringReader(response));
SAXParserFactory spf        =   SAXParserFactory.newInstance();
SAXParser sp                =   null;
XMLReader xr                =   null;
try{
    sp                      =   spf.newSAXParser();
    xr                      =   sp.getXMLReader();
    ParseHandler xmlHandler =   new ParseHandler(context.getSiteListArray().indexOf(website), context);
    xr.setContentHandler(xmlHandler);
    xr.parse(xmlInputSource);
    postSuccessfullParsingNotification();
}catch(SAXException e){
    e.printStackTrace();
}catch(ParserConfigurationException e){
    e.printStackTrace();
}catch (IOException e){
    e.printStackTrace();
    e.toString();
}

Where response is the string I receive from the from the socket.

Should look into other parsers like VTD-XML? Or is there a way to make Sax work efficiently?

Btw: Whenever a new string arrives in the socket to be parsed I open a new thread for parsing the string.

This is my handler code    

public class ParseHandler extends DefaultHandler {
    private Website     mWebsite;
    private Visitor     mVisitor;
    private VisitorInfo mVisitorInfo;
    private boolean     isVisit;
    private boolean     isVisitor;
    private AppContext  appContext;

    public ParseHandler(int index,AppContext context){
        appContext          =   context;
        mWebsite            =   appContext.getSiteListArray().get(index);
    }

    @Override
    public void startDocument() throws SAXException {
        super.startDocument();        
    }

    @Override
    public void startElement(String namespaceURI, String localName,String qName, Attributes atts) 
            throws SAXException {
        if(localName.equals("visit")) {
            isVisit = true;            
        } else if(localName.equals("visitor") && isVisit) {
            isVisitor  = true; 
            mVisitor = new Visitor();
            mVisitor.mDisplayName = "Visitor - #"+atts.getValue("id");
            mVisitor.mVisitorId   = atts.getValue("id");
            mVisitor.mStatus      = atts.getValue("idle");
        } else if(localName.equals("info") && isVisitor){
            mVisitorInfo = mVisitor.new VisitorInfo();
            mVisitorInfo.mBrowser     = atts.getValue("browser");
            mVisitorInfo.mBrowserName = atts.getValue("browser").replace("+", " ");
            mVisitorInfo.mCity        = atts.getValue("city").replace("+", " ");
            mVisitorInfo.mCountry     = atts.getValue("country");
            mVisitorInfo.mCountryName = atts.getValue("country");
            mVisitorInfo.mDomain      = atts.getValue("domain");
            mVisitorInfo.mIp          = atts.getValue("ip");
            mVisitorInfo.mLanguage    = atts.getValue("language");
            mVisitorInfo.mLatitude    = atts.getValue("lat");
            mVisitorInfo.mLongitude   = atts.getValue("long");
            mVisitorInfo.mOrg         = atts.getValue("org").replace("+", " ");
            mVisitorInfo.mOs          = atts.getValue("os");
            mVisitorInfo.mOsName      = atts.getValue("os").replace("+", " ");
            mVisitorInfo.mRegion      = atts.getValue("region").replace("+", " ");
            mVisitorInfo.mScreen      = atts.getValue("screen");
        }
    }   

    @Override
    public void characters(char ch[], int start, int length) {
    }

    @Override
    public void endElement(String namespaceURI, String localName, String qName) throws SAXException {
        if(localName.equals("visit")) {
            isVisit  = false;
        } else if(localName.equals("visitor")) {
            isVisitor = false;
            if(mVisitor == null){
                Log.e("mVisitor","mVisitor");
            } else if(mVisitor.mVisitorId == null){
                Log.e("mVisitor.mVisitorId","mVisitor.mVisitorId");   
            }
            mWebsite.mVisitors.put(mVisitor.mVisitorId, mVisitor);
        } else if(localName.equals("info")  && isVisitor) {
            mVisitor.mVisitorInfo = mVisitorInfo;
        }
    }

    @Override
    public void endDocument() throws SAXException {

    }
}

**

EDIT: AFTER THOUGHTS..

**

After further investigating I found out that my parsing wasn't causing the exception. Every time I receive a stream from the socket I store it in a String and I keep appending that till we get "\n" in the stream. The "\n" is used to denote the end of a block of xml. The string is causing the memory exception. I tried the StringBuilder but that also caused the same problem. I dont know why this is happening.

Now I tried sending the inputstream directly for parsing but "\n" at the end causes a parse exception. Is there anything we can set so that the parser will ignore "\n"?

Community
  • 1
  • 1
blessanm86
  • 31,439
  • 14
  • 68
  • 79

2 Answers2

0

it seems you're passing the whole xml file to the parser, so whenever the file is too big, you get the outOfMemory exception.

You should try to read the output from the socket in chunks and feed it to the parser as it comes. So you would do the xr.parse() inside a loop.

Smugrik
  • 850
  • 7
  • 22
  • after further investigating I found out that my parsing wan't causing the exception. Every time I receive a stream from the socket I store it in a String and I keep appending that till we get "\n" in the stream. The "\n" is used to denote the end of a block of xml. The string is causing the memory exception. I tried the StringBuilder but that also caused the same problem. I dont know why this is happening. Now I tried your method, that parsing goes fine but "\n" at then causes a parse exception. Is there anything we can set so that the parser will ignore "\n"? – blessanm86 Aug 13 '11 at 10:21
  • I'm not sure here as it was long ago, but I believe there is some options in the parser to tell it to ignore unsignificant whitespace, such as formatting tabs and spaces at beginning of lines and linefeeds – Smugrik Aug 13 '11 at 11:56
  • check the doc here for an idea ignorableWhitespace http://download.oracle.com/javase/1.4.2/docs/api/org/xml/sax/ContentHandler.html#ignorableWhitespace(char[], int, int) – Smugrik Aug 13 '11 at 12:07
0

Another post was made on SO with my problem and the answer over there was the solutions to my problem.

Here's the solutions for anyone having this problem.

Reading big chunk of xml data from socket and parse on the fly

Community
  • 1
  • 1
blessanm86
  • 31,439
  • 14
  • 68
  • 79