1

I have read Joel's article about encodings. As I understand in case of unicode:

  1. unicode is a charater set - mapping between integer value and character
  2. utf-8 is an encoding which is used for unicode integers to present them in binary view

What about iso-8859-1? Is it encoding or character set or both?

mtkachenko
  • 5,389
  • 9
  • 38
  • 68
  • Possible duplicate of [What is ANSI format?](http://stackoverflow.com/questions/701882/what-is-ansi-format) – CodeCaster Jul 22 '16 at 08:54
  • It is an encoding of a specific character set. Unicode came about to solve the disaster caused by these 8-bit encodings. Way too many of them in common use, companies like Microsoft, Apple, Adobe, IBM made their own with incompatible choices on what characters were part of the character set. ISO solved the problem by adding 16 more ways to get it wrong. Don't use it. – Hans Passant Jul 24 '16 at 11:34

2 Answers2

1

ISO 8859-1 (Latin-1) is a single-byte encoding. It represents the first 256 Unicode characters. So, as long as it is subset of Unicode character set, I suppose it could be treated as both encoding and character set.

Vasiliy Vlasov
  • 3,316
  • 3
  • 17
  • 20
0

What about iso-8859-1? Is it encoding or character set or both?

Historically, it was described as a coded character set: it defined both a set of characters, and a mapping of those characters to byte values — what we would today call an encoding, but it was not explicitly described in those terms.

When Unicode was created, it was designed to encompass (nearly) all characters in widely-used character sets, and hence it recast the byte stream defined by the ISO-8859-1 coded character set as an encoding of the wider Universal Character Set.

So if you are working in a modern Unicode environment you would consider ISO-8859-1 to be an encoding. But it can't really be said to be wrong to consider it also a character set.

(There are other encodings which are definitely not character sets: for example the UTFs, and multibyte encodings like Shift-JIS, which was itself defined as an encoding for the JIS X 0208 character set prior to Unicode's extend-and-embrace.)

bobince
  • 528,062
  • 107
  • 651
  • 834