1

Utf-8 is " is a variable-width encoding that can represent every character in the Unicode character set" (wikipedia), unicode is "standard for the consistent encoding, representation and handling of text" (wikipedia). They're difference things. Why does windows notepad give possibility to save document in unicode and utf-8? How can I compare two difference things?

mtkachenko
  • 5,389
  • 9
  • 38
  • 68

1 Answers1

3

To simplify, Unicode says what number should represent each character. UTF-8 says how to arange the bits to form different strings of unicode values.

According to this thread, what Unicode means in notepad is UTF-16 Little Endian (UTF-16LE) which is another way arranging the bits in order to form strings of Unicode values.

Community
  • 1
  • 1
Simon
  • 6,293
  • 2
  • 28
  • 34
  • I have one more question about utf-8. if utf-8 can encode symbols which requires more than 8 bit (this mechanism is described here http://en.wikipedia.org/wiki/UTF-8#Description) why sometimes we use utf-16 or utf-32? – mtkachenko Sep 17 '13 at 12:01
  • Yes, utf-8 can represent any character but many characters cannot be enconded with 8 bits so utf-8 has to hold information not just about what Unicode value it represents but also on whether it stretches over more than 8bits. When decoding utf8 you must process this information. Thus utf16/utf32 can sometimes be better for perfomance and sometimes better for memory footprint than utf8. However if you know that you'll mostly use characters with low unicode values then utf8 is better. – Simon Sep 17 '13 at 12:12
  • Now it's clear for me, thank you. – mtkachenko Sep 17 '13 at 12:13