Questions tagged [utf-32]

UTF-32 is a character encoding that represents all Unicode code points in four bytes per character.

UTF-32 is a character encoding that represents all code points in four bytes each. It is therefore the only (sort-of, depending on how deep you go) fixed-width Unicode encoding.

There are variants of UTF-32 that differ in .

The algorithm for encoding code points as UTF-32 is described in RFC 2781.

Related tags

  • The character set it serializes
  • Other s: , some rarely used special-case or obsolete encodings
91 questions
641
votes
14 answers

UTF-8, UTF-16, and UTF-32

What are the differences between UTF-8, UTF-16, and UTF-32? I understand that they will all store Unicode, and that each uses a different number of bytes to represent a character. Is there an advantage to choosing one over the other?
user60456
86
votes
5 answers

What's the point of UTF-16?

I've never understood the point of UTF-16 encoding. If you need to be able to treat strings as random access (i.e. a code point is the same as a code unit) then you need UTF-32, since UTF-16 is still variable length. If you don't need this, then…
dsimcha
  • 67,514
  • 53
  • 213
  • 334
31
votes
2 answers

Utf8_general_ci or utf8mb4 or...?

utf16 or utf32? I'm trying to store content in a lot of languages. Some of the languages use double-wide fonts (for example, Japanese fonts are frequently twice as wide as English fonts). I'm not sure which kind of database I should be using. …
Wolfpack'08
  • 3,982
  • 11
  • 46
  • 78
29
votes
1 answer

Why is there no UTF-24?

Possible Duplicate: Why UTF-32 exists whereas only 21 bits are necessary to encode every character? The maximum Unicode code point is 0x10FFFF in UTF-32. UTF-32 has 21 information bits and 11 superfluous blank bits. So why is there no UTF-24…
Anthony Faull
  • 17,549
  • 5
  • 55
  • 73
27
votes
3 answers

Does Unicode have a defined maximum number of code points?

I have read many articles in order to know what is the maximum number of the Unicode code points, but I did not find a final answer. I understood that the Unicode code points were minimized to make all of the UTF-8 UTF-16 and UTF-32 encodings able…
user4344762
21
votes
3 answers

What Character Encoding is best for multinational companies

If you had a website that was to be translated into every language in the world and therefore had a database with all these translations what character encoding would be best? UTF-128? If so do all browsers understand the chosen encoding? Is…
HGPB
  • 4,346
  • 8
  • 50
  • 86
15
votes
2 answers

How do I create a string with a surrogate pair inside of it?

I saw this post on Jon Skeet's blog where he talks about string reversing. I wanted to try the example he showed myself, but it seems to work... which leads me to believe that I have no idea how to create a string that contains a surrogate pair…
michael
  • 14,844
  • 28
  • 89
  • 177
12
votes
1 answer

How to write 3 bytes unicode literal in Java?

I'd like to write unicode literal U+10428 in Java. http://www.marathon-studios.com/unicode/U10428/Deseret_Small_Letter_Long_I I tried with '\u10428' and it doesn't compile.
kawty
  • 1,656
  • 15
  • 22
9
votes
2 answers

No UTF-32 big-endian in C#?

In C#, Encoding.UTF32 is UTF-32 little-endian, Encoding.BigEndianUnicode is UTF-16 big-endian, Encoding.Unicode is UTF-16 little-endian. But I can't find any for UTF-32 big-endian. I'm developing a simple textviewer and don't think there are many…
Jenix
  • 2,996
  • 2
  • 29
  • 58
9
votes
2 answers

Reading/writing/printing UTF-8 in C++11

I have been exploring C++11's new Unicode functionality, and while other C++11 encoding questions have been very helpful, I have a question about the following code snippet from cppreference. The code writes and then immediately reads a text file…
Ephemera
  • 8,672
  • 8
  • 44
  • 84
8
votes
5 answers

UTF32 and C# problems

So I've got some troubles with character encoding. When I put the following two characters into a UTF32 encoded text file: 鸕 and then run this code on them: System.IO.StreamReader streamReader = new System.IO.StreamReader("input",…
AStupidNoob
  • 1,980
  • 3
  • 23
  • 35
8
votes
2 answers

How to get a reliable unicode character count in Python?

Google App Engine uses Python 2.5.2, apparently with UCS4 enabled. But the GAE datastore uses UTF-8 internally. So if you store u'\ud834\udd0c' (length 2) to the datastore, when you retrieve it, you get '\U0001d10c' (length 1). I'm trying to count…
Travis
  • 2,961
  • 4
  • 22
  • 29
8
votes
1 answer

Conversion from wstring to u16string and back (standard conform) in C++17 / C++20

My main platform is Windows which is the reason why I use internally UTF-16 (mostly BMP strings). I would like to use console output for these strings. Unfortunately there is no std::u16cout or std::u8cout so I need to use std::wcout. Therefore I…
Bernd
  • 2,113
  • 8
  • 22
7
votes
2 answers

What open source C or C++ libraries can convert arbitrary UTF-32 to NFC?

What open source C or C++ libraries can convert arbitrary UTF-32 to NFC? Libraries that I think can do this so far: ICU, Qt, GLib (not sure?). I don't need any other complex Unicode support; just conversion from arbitrary but known-correct UTF-32 to…
wjl
  • 7,519
  • 2
  • 32
  • 41
7
votes
2 answers

How to detect unicode string width in terminal?

I'm working on a terminal based program that has unicode support. There are certain cases where I need to determine how many terminal columns a string will consume before I print it. Unfortunately some characters are 2 columns wide (chinese,…
KyleL
  • 1,379
  • 2
  • 13
  • 35
1
2 3 4 5 6 7