8

This code is used to generate a XML document from its String representation. It works fine in my small unit tests, but fails in my actual xml data. The line where it triggers is Document doc = db.parse(is);

Any ideas?

public static Document FromString(String xml)
{
    // from http://www.rgagnon.com/javadetails/java-0573.html
    try
    {
        DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
        DocumentBuilder db = dbf.newDocumentBuilder();
        InputSource is = new InputSource();
        is.setCharacterStream(new StringReader(xml));

        Document doc = db.parse(is);
        doc.normalize();

        return doc;
    }
    catch (Exception e)
    {
        Log.WriteError("Failed to parse XML", e, "XML.FromString(String)");
        return null;
    }
}
Jorgesys
  • 124,308
  • 23
  • 334
  • 268
Kurru
  • 14,180
  • 18
  • 64
  • 84
  • 1
    It is likely not your code, but the "XML" string that you are loading and attempting to parse. If it isn't XML, then it will throw parse exceptions when it encounters things like elements that are not closed, invalid characters, etc. – Mads Hansen Feb 14 '11 at 01:49
  • What does the exception say? Have you tested your XML against an outside source to make sure it is valid? – lavinio Feb 14 '11 at 01:50
  • found this message in the exception: "PI must not start with xml (position:unknown xm@1:5 in java.io.StringReader@4625d540) " Not sure what this means since I'm fairly sure starting the 1st character is – Kurru Feb 14 '11 at 02:20
  • 5
    Typically you get this if you extra whitespace before XML declaration -- this is not allowed; if you do have xml declaration, it MUST start without any leading whitespace. And on the other hand, processing instructions (PI) are not allowed to have target name of "xml", hence error message. – StaxMan Feb 14 '11 at 04:29
  • You can also get this if you read the string from a stream using the wrong encoding. – Ted Hopp Feb 14 '11 at 05:00
  • I think there's something in the string before the – Michael Kay Feb 14 '11 at 09:58

6 Answers6

16

Thanks for your help everyone.

I discarded the <?xml version="1.0" encoding="utf-8"?> which cleared this error. Still don't understand what the reason for this might be, but it worked nonetheless.

I went on to find one of my buffered writers (when extracting from a zip file into memory) wasn't being flushed, which was causing the xml string to be incomplete.

Thanks everyone for your help!

Garima Tiwari
  • 1,490
  • 6
  • 28
  • 46
Kurru
  • 14,180
  • 18
  • 64
  • 84
  • Hi Kurru, How did you discar the declaration? Did you use a transformer or a substring or what? I've been struggling with this for a while... – Jon Wells Nov 07 '11 at 16:27
  • 1
    Theres a few ways you could do this, I think since I could rely on formatting I just threw away the 1st line of the code. You could also try to substring it according to the 1st > symbol – Kurru Nov 07 '11 at 17:28
  • 1
    Thanks for the reply, I ended up using the substring approach. – Jon Wells Nov 07 '11 at 17:30
  • 1
    @CrimsonChin Feel free to upvote my answer as reward :P Upvoted comments dont get rep I'm afraid! – Kurru Nov 07 '11 at 18:39
  • 2
    After getting the string to a variable I just used `xml=xml.replace("", "");` and the error was gone – Gabriel Kaffka Nov 10 '13 at 01:24
3

You may check if your xml file has BOM header

shaobin0604
  • 1,179
  • 12
  • 14
3

I had the same problem while parsing XML generated by PHP. After I added the ContentType header "text/xml" it works like a charm.

Sean Vieira
  • 155,703
  • 32
  • 311
  • 293
Timo Bakx
  • 31
  • 1
2

as @StaxMan said, remove any unknown characters before

responseBody = responseBody.substring(responseBody.indexOf("<"));

jowett
  • 774
  • 5
  • 10
1

this issue will be caused too by having the line < ?xml version="1.0" encoding="UTF-8"?> together with the xml data in the same line...

< ?xml version="1.0" encoding="UTF-8"?>< secciones>< seccion>< id>0< /id>< nombre>Portada< feedURL>http://iphone.elnorte.com/libre/online07/a ....

Jorgesys
  • 124,308
  • 23
  • 334
  • 268
0

You should have checked the encoding of the file instead of discarding the xml line.

I have found that my Eclipse (on Windows) had the same problem with a resource encoded as Unix-U8. After converting it to DOS-U8, the error went away.

Saran
  • 3,845
  • 3
  • 37
  • 59