Unicode and UTF-8 difference, lof of inconsistencies from the whole internet

Question

I know there are a lot of answers about this subject, but I need some clarification.

From what I've understood, ASCII and Unicode are both charsets, they tell you that A is decimal(41) and B is decimal(42) for example.

UTF-8, UTF-16, UTF-32, and ANSI are encodings they are tasked with storing those 41 and 42 numbers into a binary form of their liking and managing their retrieval and conversion back to decimal. Then with the charset, you are able to get the corresponding char.

But, I was looking into how to get which charset/encoding is used by a webpage and I did tools>page information on Firefox.

And I can read this: charset=utf-8

(this is the page: http://www.leboncoin.fr/annonces/offres/ile_de_france/)

Is this a bug in Firefox? Or, did I completely misunderstand charset/encoding?

score 0 · Answer 1 · edited May 23 '17 at 11:59

You have slightly misunderstood character sets, though this is not a big issue. A character set is just the set of available characters, it doesn't have to reference any numbers (though they almost always do). See also: What's the difference between encoding and charset?

The real issue here is the use of charset. It comes from an HTML5 meta tag that often looks something like this:

<meta charset="utf-8" />

Despite the name, charset actually specifies a character encoding in HTML5, not a character set. This is likely due to historical confusion between character sets and encodings, as there was not much difference between the two before Unicode introduced multiple encodings for a single character set.

Unicode and UTF-8 difference, lof of inconsistencies from the whole internet

1 Answers1