0

Does the space character in HTML Living Standard mean only ASCII space character?

I know that the HTML4 specification defines whitespace as follows:

9.1 White space

The document character set includes a wide variety of white space characters. Many of these are typographic elements used in some applications to produce particular visual spacing effects. In HTML, only the following characters are defined as white space characters:

  • ASCII space ( )
  • ASCII tab (	)
  • ASCII form feed ()
  • Zero-width space (​)

As a result, HTML4 recognizes that all whitespace characters other than those defined below are treated in the same way as ordinary characters. This fact means that continuous U+0020 is combined into one, but continuous U+2009 is not combined into one and maintains a continuous state.

<h2>U+0020 is combined</h2>
<p>this is      loooooooooooo         ng text</p>
<h2>U+2009 (white space that is out of definition) is not combined</h2>
<p>this is       loooooooooooo           ng text</p>

I searched the WHATWG Living Standard for a description corresponding to this white space definition in HTML4, but I couldn't find it. Where is the definition of white space in HTML Living Standard?

I read the following articles, but these did not have the answer to my question.

  • they are many references => HTML character entity references https://en.wikipedia.org/wiki/List_of_XML_and_HTML_character_entity_references – Mister Jojo Oct 06 '19 at 01:25
  • In the second link I found a link to the [css white-space rule](https://www.w3.org/TR/CSS22/text.html#white-space-prop) that might be what you are looking for. – some Oct 06 '19 at 01:27

1 Answers1

1

As per the HTML Living Standard whitespace is defined as "ASCII Whitespace" referencing the Infra Specification as a dependency. Is it listed as a dependency here, which references this, defining whitespace as:

U+0009 TAB, U+000A LF, U+000C FF, U+000D CR, or U+0020 SPACE.
Matt Davis
  • 1,167
  • 8
  • 21