Can I encode any unicode symbol using UTF-8/16/32?

Question

I know that unicode is a huge symbol set. It enable Japan an China alphabet.

I am reading information about UTF-8/16/32 but nowhere I cannot find phrase that I can encode absolutely any symbol from unicode using UTF-8/16/32.

Is it truth that every unicode encoding has same power ?

if it is true then what the reason to use utf-16/32 if in common - utf-8 use memory more "decently" and ascii compatibility?

UTF-8/16/32/Batman are all just different ways of representing sequences of Unicode code points. They're all equally expressive and mutually convertible; they just have different trade-offs in terms of storage and processing. — Kerrek SB, Jul 07 '14 at 08:42
possible duplicate of [Do UTF-8,UTF-16, and UTF-32 Unicode encodings differ in the number of characters they can store?](http://stackoverflow.com/questions/130438/do-utf-8-utf-16-and-utf-32-unicode-encodings-differ-in-the-number-of-characters) — Jukka K. Korpela, Jul 07 '14 at 09:01

score 2 · Accepted Answer · answered Jul 07 '14 at 08:40

2

Yes you can. For all readers: Unicode is a numbering from U+0000 upto a 3 byte range. UTF-8 is a multibyte code that chains bytes till with set high bit, some sequence bit(s) and free numbering bits. UTF-16 has also an escaping. And UTF-32 suffices.

For Asian scripts UTF-8 is not optimal, for latin script it is optimal. In general that would only play a role on small devices, or huge databases.

answered Jul 07 '14 at 08:40

Joop Eggen

107,315
7
83
138

1

"For Asian scripts UTF-8 is not optimal, for latin script it is optimal." And yet even though UTF-8 is not an optimal encoding for Asian scripts, it can be the best encoding in many situations *involving* Asian scripts. For example, [the Wikipedia page for the Chinese language, in Chinese](https://zh.wikipedia.org/wiki/汉语) is 318,945 bytes in UTF-8. Convert that same HTML document to UTF-16 (which uses only 2 bytes per Chinese character, while UTF-8 uses 3), and its size becomes 542,040 bytes. The amount of Latin script in HTML overwhelms the space savings from efficiently encoding Chinese. – rmunn Sep 22 '15 at 16:13
1

Therefore, "just use UTF-8 unless you have a very good reason not to" is actually a really good rule of thumb. – rmunn Sep 22 '15 at 16:14

score 1 · Answer 2 · edited Jul 07 '14 at 19:12

1

All UTF-x encodings can represent all Unicode codepoint sequences.

With UTF-32, each codepoint requires 4 bytes.

With UTF-16, most codepoints use 2 bytes; exotic codepoints use 4 bytes via UTF-16 surrogates.

With UTF-8, a codepoint may use between 1 to 4 bytes.

With European character sets, UTF-8 is the most memory efficient encoding.

edited Jul 07 '14 at 19:12

Remy Lebeau

555,201
31
458
770

answered Jul 07 '14 at 08:37

laune

31,114
3
29
42

Can I encode any unicode symbol using UTF-8/16/32?

2 Answers2