0

Today I've started my first HTML page. Where is the page encoding stored exactly?

At first, é turned into é. Then I used my text editor to save the file with an encoding. "UTF-8" didn't work. Then I used "ISO 8859-1", which did work. How did my browser know it was encoded with "ISO 8859-1"?

I can't see it anywhere in my file, so I'm very curious about where the info is stored.

Leigh
  • 28,765
  • 10
  • 55
  • 103
aria
  • 1
  • http://stackoverflow.com/questions/4696499/meta-charset-utf-8-vs-meta-http-equiv-content-type – Thilo Apr 21 '17 at 02:33

2 Answers2

0

The encoding is stored in the header of the file itself. Notepad++ and similar programs usually provide a number of options to change and view it.

Additionally, you can provide a value by using the meta tag:

  • <meta charset="UTF-8"> (HTML5)
  • <meta http-equiv="Content-Type" content="text/html;charset=utf-8"> (HTML4)

Those tags are used by browsers to parse your file. However, they do not define the encoding of the file itself (and that's what seems to be happening in your case: your file has encoding A, and the browser is trying to read encoding B), and browsers can ignore those conditions.

The default encoding can also be defined (and overwritten) by your server. A sample .htaccess encoding configuration:

AddDefaultCharset utf-8
AddType 'text/html; charset=utf-8' .html .htm .shtml

UTF-8 is the recommended encoding standard for the web.

Pyromonk
  • 684
  • 1
  • 12
  • 27
0

The UTF-8 encoding for é is the two hex bytes C3A9.
C3 A9, when interpreted as ISO 8859-1 is two characters: é.

Browsers tend to guess correctly at the encoding. Or you can explicitly tell it how to interpret the bytes. Try that out -- you will probably see the text change between é and é.

A third case is when "double encoding" occurs. That is, somehow, the é is seen as UTF-8, hex C383 C2A9.

So, to really be sure of what is going on, you need to get the HEX.

Rick James
  • 135,179
  • 13
  • 127
  • 222