From the W3C Recommendation:
4.7. White Space handling in attribute values
When user agents process attributes, they do so according to Section 3.3.3 of [XML]:
Strip leading and trailing white space.
Map sequences of one or more white space characters (including line breaks) to a single inter-word space.
For whitespace in between tags, see the section 3.2 criteria 9:
3.2. User Agent Conformance
[1-8 snipped]
- White space is handled according to the following rules. The following characters are defined in [XML] white space characters:
SPACE ( )
HORIZONTAL TABULATION ( )
CARRIAGE RETURN (
)
LINE FEED (
)
The XML processor normalizes different systems' line end codes into one single LINE FEED character, that is passed up to the application.
The user agent must use the definition from CSS for processing whitespace characters [CSS2]. Note that the CSS2 recommendation does not explicitly address the issue of whitespace handling in non-Latin character sets. This will be addressed in a future version of CSS, at which time this reference will be updated.
Also see section C.15:
C.15. White Space Characters in HTML vs. XML
Some characters that are legal in HTML documents, are illegal in XML document. For example, in HTML, the Formfeed character (U+000C) is treated as white space, in XHTML, due to XML's definition of characters, it is illegal.