13

Something that made me curious - supposedly the default character encoding in HTML5 is UTF-8. However if I have a plain simple HTML file with an HTML5 doctype like the code below, I get:

"hello" in Russian: "ЗдраÑтвуйте"

In Chrome 33+, Safari 6, IE11, etc.

<!DOCTYPE html>

<html>

<head></head>

<body>
    <p>"hello" in Russian is "здраствуйте"</p>
</body>

</html>

What gives? Shouldn't the browser utilize the UTF-8 unicode standard and display the text correctly? I'm using Coda which is set to save html files with UTF-8 encoding by default so that's not the problem.

dkugappi
  • 2,664
  • 5
  • 21
  • 22
  • you can save your file as anything you want - browser will not be on your system but on user and you never know what settings their browser have. – All Blond Apr 24 '14 at 20:17
  • 4
    "hello" in Russian is "здраствуйте" it is wrong! "hello" in Russian is "здравствуйте"! – Dmytro Nov 03 '16 at 19:54

2 Answers2

24

The text data in the example is UTF-8 encoded text misinterpreted as window-1252 encoded. The reason is that the encoding has not been specified and browsers are forced to make a guess. To fix this, specify the encoding; see the W3C page Character encodings. Two simple ways that work independently of server settings, as long as the server does not send wrong encoding information in HTTP headers:

1) Save the file as UTF-8 with BOM (there is probably an option for this in your authoring program.

2) Add the following tag into the head part:

<meta charset=utf-8>

There is no single default encoding specified for HTML5. On the contrary, browsers are expected to make guesses when no encoding has been declared. This is a fairly complex process, described in 8.2.2.2 Determining the character encoding.

Jukka K. Korpela
  • 195,524
  • 37
  • 270
  • 390
9

If you want to be sure which charset will be used by browser you must have in your page head

 <meta content="text/html; charset=UTF-8" http-equiv="Content-Type">

otherwise you are at the mercy of local settings and browser automation.

All Blond
  • 802
  • 1
  • 8
  • 16