46

It seems that it is best to use the & escape, instead of simply typing the ampersand (&).

However, should we be using X/HTML character entity references for dashes and other common typographical characters when writing blog posts on CMSs like WordPress or hard-coding websites by hand?

For example:

– is an en dash (–)

— is an em dash (—)

What is the risk if we do not?

Why is the hyphen (-) never written as - but simply typed directly from the keyboard in HTML? (Assuming that it the hyphen, and not a minus sign.)

Community
  • 1
  • 1
Baumr
  • 6,124
  • 14
  • 37
  • 63
  • 1
    I don't know, I've had too many encoding issues to be courageous enough to use the real characters. I guess this is more or like FUD. – Florian Margaine May 14 '13 at 18:57
  • possible duplicate of [When Should One Use HTML Entities](http://stackoverflow.com/questions/436615/when-should-one-use-html-entities) – Jukka K. Korpela May 14 '13 at 19:35

2 Answers2

36

The W3C released an official response about when to use and when not to use character escapes which you can find here. As they are also the group that is in charge of the HTML specification, I think it's best to follow their advice.

From the section "When to Use Escapes"

Syntax characters. There are three characters that should always appear in content as escapes, so that they do not interact with the syntax of the markup. These are part of the language for all documents based on XML and for HTML.

  • &lt; (<)

  • &gt; (>)

  • &amp; (&)

They also mention using characters that might not be supported in the current encoding.

From the section "When Not to Use Escapes"

It is almost always preferable to use an encoding that allows you to represent characters in their normal form, rather than using character entity references or NCRs.

Using escapes can make it difficult to read and maintain source code, and can also significantly increase file size.

http://www.w3.org/International/questions/qa-escapes

Benjamin Gruenbaum
  • 270,886
  • 87
  • 504
  • 504
  • Thanks, good answer. But `&emdash;` isn't an "escape", it's a [character entity reference](http://en.wikipedia.org/wiki/Character_entity_reference) – Baumr May 14 '13 at 20:39
  • 1
    @Baumr If I understand correctly the two are synonymous. When the W3C refer to a character escape in the referenced article they mean character entity reference. – Benjamin Gruenbaum May 14 '13 at 20:43
  • 1
    Worth mentioning, according to the HTML spec, things like `
    >
    ` are legal. (Doesn't mean you shouldn't abide to the recommendations though).
    – Benjamin Gruenbaum Jun 03 '13 at 07:37
  • FYI, the PhpStorm editor displays most entities as their visual equivalent for display. They're highlighted and you can see the actual entity code by hovering over them. So it's not difficult to read or maintain the source code containing entities. – Bob Ray Aug 15 '21 at 20:30
9

Those entities are there to help you, the author, with characters not usually typable on your average keyboard. (The em dash is an example , as well as &copy; and &nbsp;).

You only need to escape those characters that have meaning in (X)HTML < > and &.

Madara's Ghost
  • 172,118
  • 50
  • 264
  • 308
  • Thanks Madara, I also tried to make the "escape" vs. "reference" distinction clear in my question, but good that you clarified it too – Baumr May 14 '13 at 19:06