0

Unicode has lots of whitespace characters (newlines, tabs, spaces of various widths, control characters).

Do they all count as whitespace in HTML? ie. do they all get collapsed into a single space? Or are things like em spaces and hair spaces handled differently?

callum
  • 34,206
  • 35
  • 106
  • 163

1 Answers1

1

HTML 4.01 explicitly says that it does not define the rendering of space characters other than Ascii space, Ascii tab, form feed, and zero-width space. Thus, characters like EM SPACE and HAIR SPACE have no defined rendering.

In practice, browsers render them using (empty) glyphs as available in the fonts used, so they “work” as intended, provided that the browser can find some font that contains them.

Browsers do this so that such characters are treated the same way as normal printable characters. In effect, they are like letters, occupying some width, but empty. This means that they are not collapsed at all, and they are not stretched in justification (text-align: justify).

Since this is unspecified, however, there is no guarantee that things will not change. It is thus generally safer to create any desired spacing using CSS (possibly with extra inline markup added for that).

There does not seem to be any clarification or change in this in HTML5 drafts.

Jukka K. Korpela
  • 195,524
  • 37
  • 270
  • 390