1

My code write a XML file with the LSSerializer class :

DOMImplementation impl = doc.getImplementation();
DOMImplementationLS implLS = (DOMImplementationLS) impl.getFeature("LS","3.0");

LSSerializer ser = implLS.createLSSerializer();

String str = ser.writeToString(doc);
System.out.println(str);

String file = racine+"/"+p.getNom()+".xml";
OutputStreamWriter out = new OutputStreamWriter(new FileOutputStream(file),"UTF-8");
out.write(str);
out.close();

The XML is well-formed, but when I parse it, I get an error.

Parse code :

File f = new File(racine+"/"+filename);

DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();
Document doc = builder.parse(f);

XPathFactory xpfactory = XPathFactory.newInstance();
XPath xp = xpfactory.newXPath();

String expression;

expression = "root/nom";        
String nom = xp.evaluate(expression, doc);

The error :

[Fatal Error] Terray.xml:1:40: Content is not allowed in prolog.
9 août 2011 19:42:58 controller.MakaluController activatePatient
GRAVE: null
org.xml.sax.SAXParseException: Content is not allowed in prolog.
at com.sun.org.apache.xerces.internal.parsers.DOMParser.parse(DOMParser.java:249)
at com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderImpl.parse(DocumentBuilderImpl.java:284)
at javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:208)
at model.MakaluModel.setPatientActif(MakaluModel.java:147)
at controller.MakaluController.activatePatient(MakaluController.java:59)
at view.ListePatientsPanel.jButtonOKActionPerformed(ListePatientsPanel.java:92)
...

Now, with some research, I found that this error is dure to a "hidden" character at the very beginning of the XML.

In fact, I can fix the bug by creating a XML file manually.

But where is the error in the XML writing ? (When I try to println the string, there is no space before ths

Solution : change the serializer

I run the solution of UTF-16 encoding for a while, but it was not very stable. So I found a new solution : change the serializer of the XML document, so that the encoding is coherent between the XML header and the file encoding. :

    DOMSource domSource = new DOMSource(doc);
    TransformerFactory tf = TransformerFactory.newInstance();
    Transformer transformer = tf.newTransformer();

    String file = racine+"/"+p.getNom()+".xml";
    OutputStreamWriter out = new OutputStreamWriter(new FileOutputStream(file),"UTF-8");

    transformer.setOutputProperty(OutputKeys.ENCODING, "UTF-8");
    transformer.setOutputProperty(OutputKeys.INDENT,"yes");
    transformer.transform(domSource, new StreamResult(out));
user777466
  • 991
  • 2
  • 13
  • 21
  • See also http://stackoverflow.com/questions/5138696/org-xml-sax-saxparseexception-content-is-not-allowed-in-prolog – Raedwald Jul 18 '14 at 09:49

4 Answers4

4

But where is the error in the XML writing ?

Looks like the error is not in the writing but the parsing. As you have already discovered there is a blank character at the beginning of the file, which causes the error in the parse call in your stach trace:

Document doc = builder.parse(f);

The reason you do not see the space when you print it out may be simply the encoding you are using. Try changing this line:

OutputStreamWriter out = new OutputStreamWriter(new FileOutputStream(file),"UTF-8");

to use 'UTF-16' or 'US-ASCII'

Terrell Plotzki
  • 2,014
  • 17
  • 17
4

I think that it is probably linked to BOM (Byte Order Mark). See Wikipedia

You can verify with Notepad++ by example : Open your file and check the "Encoding" Menu to see if you're in "UTF8 without BOM" or "UTF8 with BOM".

fbdcw
  • 829
  • 5
  • 8
  • To confirm this, look at the input with a hex editor. I ran into the same problem a while back and solved it by first consuming the BOM bytes before feeding the data to the validating parser. – Connor Doyle Aug 09 '11 at 19:23
  • The bytes to check are : 0xEF,0xBB,0xBF – fbdcw Aug 09 '11 at 19:25
  • Ah, mystery : with the "UTF-8" option, I get an AINSI file ; and with a "UTF-16" option, I get a UTF16 encoded file ! – user777466 Aug 09 '11 at 20:09
1

Using UTF-16 is the way to go,

 OutputStreamWriter out = new OutputStreamWriter(new FileOutputStream(fileName),"UTF-16");

This can read the file with no issues

Kersy
  • 116
  • 5
0

Try this code:

InputStream is = new FileInputStream(file);
Document doc = builder.parse(is , "UTF-8");
Draken
  • 3,134
  • 13
  • 34
  • 54
hossein ketabi
  • 480
  • 7
  • 19