7

I'm using Apache Xerces 2.11.0 and Apache Xalan 2.7.1 and I'm having problems with additional carriage return characters in the serialized XML.

I have this (pseudo) code:

String myString = ...;
Document doc = ...;

Element item = doc.createElement("item");
item.appendChild(doc.createCDATASection(myString));

Transformer transformer = ...;
ByteArrayOutputStream stream = new ByteArrayOutputStream();
Result result = new StreamResult(stream);
transformer.transform(new DOMSource(document), result);

Now myString contains line breaks (\r\n), (actually it's base64 encoded data) but when I look at the serialized output, there are additional \r characters.

Input:

Line 1 \r\n
Line 2 \r\n
Line 3 \r\n

Output:

Line 1 \r\r\n
Line 2 \r\r\n
Line 3 \r\r\n

If I use createTextNode instead of createCDATASection the output becomes even more interesting:

Line 1 
\r\n
Line 2 
\r\n
Line 3 
\r\n

The additional character seems to be introduced during serialization, the DOM tree seems to be correct. (According to getTextContent())

Why is this happening? What can I do to fix this?

Daniel Rikowski
  • 71,375
  • 57
  • 251
  • 329

3 Answers3

11

I guess your are having this problem on Windows and not on Linux/Solaris/Mac. Xalan serializer (org.apache.xml.serializer.ToStream.java) gets the line separator using System.getProperty("line.separator"). When the serializer writes \r\n, it interprets the \n as the end of line sequence and it actually writes \r+lineSeparator = \r\r\n. Although this sounds strange, this is not a bug, see [1]. But since this was frequently reported as a bug, a xalan extension property was added [2]. So you may programmatically set:

transformer.setOutputProperty("{http://xml.apache.org/xalan}line-separator","\n");

or

<xsl:output xalan:line-separator="&#10;" />

where xalan is a prefix associated with the URL "http://xml.apache.org/xalan".

[1] https://issues.apache.org/jira/browse/XALANJ-1660

[2] https://issues.apache.org/jira/browse/XALANJ-2093

Alex Giotis
  • 611
  • 7
  • 11
  • Thank you! Trying to generate CSV files that Excel can process requires changing this. New lines in cells are LF and new rows use CRLF. Have not been able to find this information easily anywhere else on the internet. – Bae Jun 05 '13 at 00:48
1

Odd, but try doing transformer.setOutputProperty(javax.xml.transform.OutputKeys.INDENT, "no"); immediately after creating the transformer and see what happens.

Femi
  • 64,273
  • 8
  • 118
  • 148
  • Odd. What is the code to create the `Result result = ..` entry? Are you using a `Writer` or a `Stream`? – Femi Jun 11 '11 at 17:45
0

Try using Xerces 2.9.0 which is tested with Xalan 2.7.1. (2.9.0 comes within the Xalan package)

After I had problems with Xerces 2.11.0 I did the same.

Charlie Brown
  • 173
  • 4
  • 12