1

I am trying to POST a potentially large chunk of xml from a C# client to a GAEJ app, and then parse it into a DOM document.

I've managed to get the documentbuilder to parse the xml by parsing the request data into a string and then trimming it, as such:

        String xml;
        BufferedReader rdr = req.getReader();
        String line;
        StringBuilder result = new StringBuilder();
        while ((line = rdr.readLine()) != null) {
            result.append(line);
        }
        xml = result.toString();
        db = dbf.newDocumentBuilder();
        Document doc = db.parse(new InputSource(new StringReader(xml.trim())));

However the GAEJ app should be as efficient as possible and reading the potentially large xml input to a string line by line, as opposed to feeding the sourcestream to the parser, seems quite bad. I would like the following to work:

        Document doc = db.parse(request.getInputStream());

But then I always get "org.xml.sax.SAXParseException: Content is not allowed in trailing section." If I dump the contents of the request.getInputStream() call to the console I can see some box-characters after the final closing tag, but I'm not sure how they got there (the clientside is using UTF-8 encoding), or how to remove them from the input stream. Thanks!

tempy
  • 1,567
  • 2
  • 18
  • 28
  • 1
    Solved my own problem, which turned out to be on the client side and not on the GAE app at all. At some point I was cloning the memorystream containing the xml using memorystream.getbuffer(), and then POSTing that stream. Bad move! memorystream.getbuffer() returns the stream and preallocated empty characters, resulting in an invalid xml post. memorystream.getbytes() works as expected. Reader beware! – tempy Jan 29 '10 at 20:18