0

I am trying to add a carriage return character and line feed character to format XML so that each element appears on its own line in MS Notepad.

I have tried what was suggested here: How to pretty print XML from Java?. Which adds a line feed character after each closing element. For more sophisticated editors like richText and gedit etc. the line feed character is enough. However, in notepad I seem to require a carriage return also in order to get each element on it's own line.

Is there a way to introduce that by altering a Document Transformer properties? If not is there a way to do this without have to parse the whole XML document and add them in manually?

Community
  • 1
  • 1
travega
  • 8,284
  • 16
  • 63
  • 91
  • 2
    Note that the [section on new line handling](http://www.w3.org/TR/REC-xml/#sec-line-ends) of XML specification states that the desired new-line character in XML documents is the line-feed character. – buc Jan 16 '12 at 23:37
  • @buc I see, so adding a carriage return character will be outside of the XML standard spec? – travega Jan 16 '12 at 23:55
  • 1
    @travega I don't think whitespace between elements is significant, so not really. What the spec is probably trying to say is that software displaying XML should not require that you use a different character for new lines. – millimoose Jan 17 '12 at 00:07
  • 2
    @travega The spec says that an XML parser must understand CRLF, CR and LF (but not LFCR) as line breaks, but internally should replace them with single LF. According to this, I think CRLF's will still be standard compliant. However, single LFs will probably also be more portable with less compliant XML parsers. Personally, I don't think that it's worth the effort to have Java emit those CRs, rather than using an other editor. – buc Jan 17 '12 at 00:07
  • @buc That's great info. So it looks like the method im using of to add "indentation" to the XML doc through Java (as per the link) is only adding the LF... So what I should be looking to do is add a CR before the LF duting the "indentation" process? I know that NotePad will recognise a CR but swallows the LF characters. – travega Jan 17 '12 at 00:19
  • FYI - Wordpad will render line-feed characters in the document the way that you are expecting. Whenever I'm trying to view an XML file in Notepad and have that issue I just right-click and select `Open with->Wordpad` – Mads Hansen Jan 17 '12 at 00:56
  • @travega Basically yes, and I think Inerdial's solution will work, it's elegant, and isn't hard to implement at all. The other way would be digging up the source of the Transformer implementation, modifying and recompiling it, brr... – buc Jan 17 '12 at 08:59

2 Answers2

3

You could make your own implementation of Writer that wraps an existing Writer / OutputStream, and replaces "\n" with "\r\n" on the fly while writing. If whitespace isn't significant in the XML text, this should be good enough. Then, pass instances of your wrapper to the code that outputs XML.

millimoose
  • 39,073
  • 9
  • 82
  • 134
1

Just do a simple replace all on \>< with >^p< where the ^p is the ParagraphMark in Special formatting in MS Word.

DaveShaw
  • 52,123
  • 16
  • 112
  • 141
Dave
  • 11
  • 1