36

I've recently noticed a lot of high profile sites using characters directly in their source, eg:

<q>“Hi there”</q>

Rather than:

<q>&ldquo;Hi there&rdquo;</q>

Which of these is preferred? I've always used entities in the past, but using the character directly seems more readable, and would seem to be OK in a Unicode document.

mikemaccana
  • 110,530
  • 99
  • 389
  • 494
  • 1
    obviously using the chars directly could screw it up in a non-Unicode doc, but yeah, if its in UTF-8 it should work fine. +1, nice question; wondering myself – Thomas Shields Mar 21 '12 at 16:18
  • 2
    What is this "non-Unicode doc" of which you speak? :) (More seriously, this is 2012, we should have given up ISO-8859 and it's proprietary friends a decade ago). – Quentin Mar 21 '12 at 17:02
  • 4
    HTML5 spec for charset also says 'authors are encouraged to use Unicode' http://www.whatwg.org/specs/web-apps/current-work/multipage/semantics.html#charset – mikemaccana Mar 22 '12 at 09:18

5 Answers5

26

If the encoding is UTF-8, the normal characters will work fine, and there is no reason not to use them. Browsers that don't support UTF-8 will have lots of other issues while displaying a modern webpage, so don't worry about that.

So it is easier and more readable to use the characters and I would prefer to do so.

It also saves a couple of bytes which is good, although there is much more to gain by using compression and minification.

GolezTrol
  • 114,394
  • 18
  • 182
  • 210
  • 9
    the other advantage is that if you use the actual characters by default, then user-submitted content will look right even if it's not encoded - for example, on my blog I have an option to read in plaintext, which wouldn't work right if I used the html entities. – Thomas Shields Mar 21 '12 at 17:18
2

It is better to use characters directly. They make for: easier to read code.

Google's HTML style guide advocates for the same. The guide itself can be found here: Google HTML/CSS Style guide.

Mwiza
  • 7,780
  • 3
  • 46
  • 42
2

The main advantage I can see with encoding characters is that they'll look right, even if the page is interpreted as ASCII.

For example, if your page is just a raw HTML file, the default settings on some servers would be to serve it as text/html; charset=ISO-8859-1 (the default in HTTP 1.1). Even if you set the meta tag for content-type, the HTTP header has higher priority.

Whether this matters depends on how likely the page is to be served by a misconfigured server.

Brendan Long
  • 53,280
  • 21
  • 146
  • 188
1

Using characters directly. They are easier to read in the source (which is important as people do have to edit them!) and require less bandwidth.

Quentin
  • 914,110
  • 126
  • 1,211
  • 1,335
1

The example given is definitely wrong, in theory as well as in practice, in HTML5 and in HTML 4. For example, the HTML5 discussions of q markup says: “Quotation punctuation (such as quotation marks) that is quoting the contents of the element must not appear immediately before, after, or inside q elements; they will be inserted into the rendering by the user agent.”

That is, use either ´q’ markup or punctuation marks, not both. The latter is better on all practical accounts.

Regarding the issue of characters vs. entity references, the former are preferable for readability, but then you need to know how to save the data as UTF-8 and declare the encoding properly. It’s not rocket science, and usually better. But if your authoring environment is UTF-8 hostile, you need not be ashamed of using entity references.

Jukka K. Korpela
  • 195,524
  • 37
  • 270
  • 390
  • 1
    I actually add the quotes via CSS - I just wanted a smaller example for stackoverflow. PS - this would be better as a comment, as it's not part of your answer. – mikemaccana Mar 22 '12 at 08:59