50

I was looking to represent a carriage return within an xml node.
I have tried a whitespace preserve, hex entity with no luck- and a \n. viewing via a browser.

Example

<Quote>
Alas, poor Yorick!
I knew him
</Quote>

Thanks.

Gray
  • 115,027
  • 24
  • 293
  • 354
user273502
  • 521
  • 1
  • 4
  • 4

3 Answers3

84

To insert a CR into XML, you need to use its character entity &#13;.

This is because compliant XML parsers must, before parsing, translate CRLF and any CR not followed by a LF to a single LF. This behavior is defined in the End-of-Line handling section of the XML 1.0 specification.

Dan Atkinson
  • 11,391
  • 14
  • 81
  • 114
Lachlan Roche
  • 25,678
  • 5
  • 79
  • 77
  • 18
    Minor point - CR is and not which is LF - refer to this ascii table for details ... http://www.asciitable.com/ ... – dodgy_coder Dec 19 '11 at 06:34
  • 6
    Yes, is LF, is CR. Windows world usually has CRLF sequences (that is ), Linux has just LF ( ). – Luke Jan 30 '14 at 17:42
  • I was not able to get this to work in decimal, but it was fine in hex A; – MikeF Apr 07 '16 at 01:28
  • 4
    OMG, thank you sooooo much! My tests were failing because of this... :D – insan-e Oct 27 '16 at 08:39
  • @MikeF Isn’t that an invalid character entity [because it isn’t decimal](https://gist.github.com/binki/d2d8da8cbfe4c5b50054ca371f2505c4)? Are you sure you’re feeding that to an XML parser? – binki Jul 24 '17 at 14:29
  • @binki The SAX parser I was using only accepted the hex value. Did not dig into the why. – MikeF Jul 25 '17 at 02:02
14

xml:space="preserve" has to work for all compliant XML parsers.

However, note that in HTML the line break is just whitespace and NOT a line break (this is represented with the <br /> (X)HTML tag, maybe this is the problem which you are facing.

You can also add &#10; and/or &#13; to insert CR/LF characters.

Lucero
  • 59,176
  • 9
  • 122
  • 152
  • 1
    For the record the sequence on PCs at least is normally CR followed by LF but these days, a single LF is normally sufficient. – dodgy_coder Dec 19 '11 at 06:32
  • 1
    Also note that, even with `xml:space="prserve"`, the XML parser is still [required to replace `13 10` (and other sequences) with `10` before parsing](https://www.w3.org/TR/xml11/#sec-line-ends). You can enter a CR which is meant to survive parsing by [using a character entity reference such as ` `](https://www.w3.org/TR/xml11/#sec-common-syn). – binki Jul 24 '17 at 14:35
  • @binki why are you OK with xD == 13 but not xA == 10? – MikeF Jul 25 '17 at 12:22
  • 1
    @MikeF There’s never a need to encode `10`/`0xa` as a character entity. XML may be copied/pasted to different systems as text. If you do this, on some systems, a newline will be CRLF and on others LF or on others yet another character. Thus, XML spec says the processor shall normalize different sorts of newlines to `10`/`0xa` to ensure that XML transferred as text *always is parsed to the same exact value*. So, you only need to entitize non-`10` characters including CR (U+13), CR,NEL (U+13 U+85), and others listed at the W3C link. If I misunderstood your question please let me know. – binki Jul 25 '17 at 18:18
2

A browser isn't going to show you white space reliably. I recommend the Linux 'od' command to see what's really in there. Comforming XML parsers will respect all of the methods you listed.

bmargulies
  • 97,814
  • 39
  • 186
  • 310