0

I'm using JDOM with my Android project, and every time I get a certain set of characters in my server response, I end up with these error messages:

05-04 10:08:46.277: E/PARSE: org.jdom.input.JDOMParseException: Error on line 95 of document UTF-8: At line 95, column 5263: unclosed token

05-04 10:08:46.277: E/Error Handler: Handler failed: org.jdom.input.JDOMParseException: Error on line 1: At line 1, column 0: syntax error

When I make the same query through google chrome, I can see that all of the XML came through fine, and that there are in fact no areas where a token is not closed. I have run into this problem several times throughout the development of the application, and the solution has always been to remove odd ascii characters (copyright logos, or trademark characters, etc. that got copied/pasted into those data fields). How can I get it to either a remove those characters, or b strip them and continue the function. Here's an example of one of my parse functions.

public static boolean parseUserData(BufferedReader br) {
    SAXBuilder builder = new SAXBuilder();
    Document document = null;

    try {
        document = builder.build(br);

        /* XML Output to Logcat */
        if (document != null) {
            XMLOutputter outputter = new XMLOutputter(
                    Format.getPrettyFormat());
            String xmlString = outputter.outputString(document);
            Log.e("XML", xmlString);
        }

        Element rootNode = document.getRootElement();
        if (!rootNode.getChildren().isEmpty()) {

            // Do stuff
            return true;
        }

    } catch (Exception e) {
        GlobalsUtil.errorUtil
                .setErrorMessage("Error Parsing XML: User Data");
        Log.e(DEBUG_TAG, e.toString());
        return false;
    }
}
RyanInBinary
  • 1,533
  • 3
  • 19
  • 47
  • Can you upload an example response somewhere so we can see it? Also you say a certain set of characters cause a problem, but which ones? Where are they being used? – Jules May 04 '12 at 15:28
  • I cannot upload a response, as the code above is an adjusted version (variables and method names changed) of our actual code. I cannot upload the error-ridden XML response because it contains sensitive customer information. The errors appear when our customers copy/paste things into fields like (we have had them copy/paste from their emails, and things like "Powered By Motorola(tm) <--- ascii (tm)" will show up and cause problems. – RyanInBinary May 04 '12 at 15:36

2 Answers2

1

Is the BufferedReader constructed to take the encoding argument? Perhaps you need to tell the Reader or InputStream that you pass to use UTF-8.

duffymo
  • 305,152
  • 44
  • 369
  • 561
  • That sounds like it could be part of the issue, how can I set that within the created BufferedReader – RyanInBinary May 04 '12 at 15:33
  • I don't see such an argument in the javadocs. It's got to be set in the object you wrap with BufferedReader. – duffymo May 04 '12 at 15:33
  • In my query code (which grabs/returns the bufferedreader, which is then later passed off, I set the ISO mode, but not UTF). BufferedReader br = new BufferedReader(new InputStreamReader( response.getEntity().getContent(), "ISO-8859-1")); – RyanInBinary May 04 '12 at 15:38
  • I used an xml validation tool http://www.xmlvalidation.com. When i put my xml in there, it appears as though the "&" character commonly being used in the element is what is causing the issue. Is there any way to prevent that? – RyanInBinary May 04 '12 at 15:58
  • Make that UTF-8 and see if that helps. Ampersand is a big problem - that's not well-formed XML. You've got to encode "magic characters". – duffymo May 04 '12 at 16:00
1

It distinctly sounds like a character encoding issue. I think duffymo is correct in his assessment. I have two comments though ....

If you are getting your data through a URL you should be using the URLConnection.getContentType() to get the charset (if it is set and the charset is not null) to set up the InputStreamReader on the URL's InputStream...

Have you tried JDOM 2.0.1? It is the first JDOM version that is fully tested on Android... (and the only 'supported' JDOM version on Android). JDOM 2.0.1 also has a number of performance tweaks, and memory optimizations that should make your processing faster. It also fixes a number of bugs.... though from what I see you should not run in to any bug problems.....

Check out https://github.com/hunterhacker/jdom/wiki/JDOM2-Migration-Issues and https://github.com/hunterhacker/jdom/wiki/JDOM2-and-Android

rolfl
  • 17,539
  • 7
  • 42
  • 76
  • I have updated to jdom 2, which did not fix anything yet, but may keep me from having errors in the future so I appreciate that bit of information. I'm not sure about the URL content type though. I have been using HttpResponse, and then getEntity().getContent() is being passed into my input stream reader. Is that a poor way of handling it? – RyanInBinary May 04 '12 at 18:22
  • http://developer.android.com/reference/org/apache/http/HttpEntity.html#getContentType%28%29 <--- you should expect a value something like... actually, just look at this response here: http://stackoverflow.com/questions/1381617/simplest-way-to-correctly-load-html-from-web-page-into-a-string-in-java – rolfl May 04 '12 at 19:18