5

I thought utf-8 would be able to handle just a neat £ instead of having to convert to entities?

What's the proper way of handling the GBP symbol with UTF-8 and HTML5?

(ps. don't think the html5 part should make any difference)


update:

Here's test document:

<!doctype html>  
<head>
  <meta charset="utf-8">
  <title>GBP Test</title>
</head>

<body>
£55
<br />
&pound;55
</body>

Thanks everyone for your help.

For anyone else facing this frustration the issue comes with your text editor. Even Notepad formats in non utf-8.

SOLUTION:

Changed Read and Write formats to UTF-8 in my text editor (PHP Designer)

Haroldo
  • 36,607
  • 46
  • 127
  • 169
  • 1
    What's your page encoding declared as (either in the HTTP header, or in the ``)? – Matt Ball Nov 18 '10 at 15:07
  • What is the problem? I think UTF-8 can handle this caracter. – MatTheCat Nov 18 '10 at 15:07
  • @matts! updated with sample page – Haroldo Nov 18 '10 at 15:44
  • @Haroldo: what about the example? It works perfectly. – Konrad Rudolph Nov 18 '10 at 15:54
  • @Konrad - you have two £55's?!!! I've just tried it on two different computers and get £55 and ?55 ! – Haroldo Nov 18 '10 at 15:55
  • 1
    @Haroldo works for me: http://jsfiddle.net/VRzVk/ you probably have an encoding problem. Check the browser's "encoding" or "character set" menu. Where is the check mark? I bet it's not on UTF-8 – Pekka Nov 18 '10 at 15:58
  • @Haroldo your page is somehow screwing up the content-type information. Can you show a live link? – Pekka Nov 18 '10 at 16:39
  • interesting the js fiddle works fine for me, but if i copy the source into notepad and save as a .html it doesnt work? – Haroldo Nov 18 '10 at 16:42
  • @Haroldo: What encoding do you specify when saving? Chances are, it’s not UTF-8. By default, Notepad on Windows (before Windows Vista or 7) used to use something else, e.g. Windows-1252. – Konrad Rudolph Nov 18 '10 at 17:01
  • @Haroldo please, as said, check the encoding the browser actually uses. – Pekka Nov 18 '10 at 17:06
  • @Pekka - i don't see the relevance, i haven't changed my browsers settings on all the browsers on both mine and my girlfriend's computers! The fact that the Fiddle link works shows the problem is with the writing of the file, not the reading of it – Haroldo Nov 18 '10 at 17:15
  • @Haroldo looking up what character set the browser uses would tell you *for sure* whether it's a file writing problem, or a server problem. It *is* relevant – Pekka Nov 18 '10 at 17:16
  • @Pekka - thanks. I've tried saving the source code, rather than copy and pasting it, this works. So the problem must be in my text editor(s). I'll look into text editor encoding now.. Thanks for all your time, that jsfiddle link was a bit of a breakthrough. – Haroldo Nov 19 '10 at 10:15

2 Answers2

4

Just use the character. It will work fine.

The symbol has a different code point in UTF-8 than in ISO-8859-1 of course. A ISO-8859-1 encoded pound sign will not work in UTF-8, and vice versa. You'd have to convert it.

Related: When Should One Use HTML Entities

Community
  • 1
  • 1
Pekka
  • 442,112
  • 142
  • 972
  • 1,088
  • It doesn't seem to be working in the above example (just added to question) - is that because my text editor input it as non utf-8? or some other reason? – Haroldo Nov 18 '10 at 15:45
  • @Haroldo not sure whether `meta charset` will work. Try @Matt Ball's suggestion and check whether the browser really understands it (in the encoding menu) – Pekka Nov 18 '10 at 15:50
  • `` didn't work either – Haroldo Nov 18 '10 at 15:54
4

The short answer is that you don't need to use entities for most characters as long as you declare the documents character set to UTF-8 (using either a Content-Type header, a meta charset element in the head, or an xml encoding attribute with XHTML)...

The only characters you NEED to encode in a UTF-8 HTML document are (Depending on the context):

  • &amp; => &
  • &lt; => <
  • &gt; => >
  • &quot; => "

And if you are using XHTML (which is also valid XML), you also need to encode single quotes with either (again, depending on the context):

  • &apos; => '
  • &#39; => '
  • &#x0027; => '

(Note that the last 2 are preferred, since &apos; is not defined in HTML...)

Also note that &, < and > need to be escaped everywhere, and " and ' only need to be escaped inside of the appropriate attribute (so if an attribute is quoted using ", you'd need to escape all other " characters inside of that attribute)...

See the HTML 5 Draft for more information...

ircmaxell
  • 163,128
  • 34
  • 264
  • 314