Questions tagged [utf-32]

UTF-32 is a character encoding that represents all Unicode code points in four bytes per character.

UTF-32 is a character encoding that represents all unicode code points in four bytes each. It is therefore the only (sort-of, depending on how deep you go) fixed-width Unicode encoding.

There are variants of UTF-32 that differ in endianness.

The algorithm for encoding code points as UTF-32 is described in RFC 2781.

Related tags

The unicode character set it serializes
Other utfs: utf-8 utf-16, some rarely used special-case or obsolete encodings

91 questions

641

votes

14 answers

UTF-8, UTF-16, and UTF-32

What are the differences between UTF-8, UTF-16, and UTF-32? I understand that they will all store Unicode, and that each uses a different number of bytes to represent a character. Is there an advantage to choosing one over the other?

unicode utf-8 utf-16 utf utf-32

asked Jan 30 '09 at 17:05

user60456

votes

5 answers

What's the point of UTF-16?

I've never understood the point of UTF-16 encoding. If you need to be able to treat strings as random access (i.e. a code point is the same as a code unit) then you need UTF-32, since UTF-16 is still variable length. If you don't need this, then…

utf-8 character-encoding utf-16 utf utf-32

asked Mar 13 '11 at 20:28

dsimcha

67,514
53
213
334

votes

2 answers

Utf8_general_ci or utf8mb4 or...?

utf16 or utf32? I'm trying to store content in a lot of languages. Some of the languages use double-wide fonts (for example, Japanese fonts are frequently twice as wide as English fonts). I'm not sure which kind of database I should be using. …

utf-8 localization utf-16 utf-32 utf8mb4

asked Jul 18 '12 at 02:19

Wolfpack'08

3,982
11
46
78

votes

1 answer

Why is there no UTF-24?

Possible Duplicate: Why UTF-32 exists whereas only 21 bits are necessary to encode every character? The maximum Unicode code point is 0x10FFFF in UTF-32. UTF-32 has 21 information bits and 11 superfluous blank bits. So why is there no UTF-24…

unicode character-encoding utf-32

asked Apr 13 '12 at 15:32

Anthony Faull

17,549
5
55
73

votes

3 answers

Does Unicode have a defined maximum number of code points?

I have read many articles in order to know what is the maximum number of the Unicode code points, but I did not find a final answer. I understood that the Unicode code points were minimized to make all of the UTF-8 UTF-16 and UTF-32 encodings able…

unicode utf-8 utf-16 codepoint utf-32

asked Dec 11 '14 at 05:26

user4344762

votes

3 answers

What Character Encoding is best for multinational companies

If you had a website that was to be translated into every language in the world and therefore had a database with all these translations what character encoding would be best? UTF-128? If so do all browsers understand the chosen encoding? Is…

utf-8 character-encoding utf-16 utf-32

asked Apr 20 '11 at 15:43

HGPB

4,346
8
50
86

votes

2 answers

How do I create a string with a surrogate pair inside of it?

I saw this post on Jon Skeet's blog where he talks about string reversing. I wanted to try the example he showed myself, but it seems to work... which leads me to believe that I have no idea how to create a string that contains a surrogate pair…

c# string utf-16 utf-32 surrogate-pairs

asked Jan 15 '13 at 22:06

michael

14,844
28
89
177

votes

1 answer

How to write 3 bytes unicode literal in Java?

I'd like to write unicode literal U+10428 in Java. http://www.marathon-studios.com/unicode/U10428/Deseret_Small_Letter_Long_I I tried with '\u10428' and it doesn't compile.

java unicode utf-16 utf-32 unicode-literals

asked Jul 08 '14 at 13:35

kawty

1,656
15
22

votes

2 answers

No UTF-32 big-endian in C#?

In C#, Encoding.UTF32 is UTF-32 little-endian, Encoding.BigEndianUnicode is UTF-16 big-endian, Encoding.Unicode is UTF-16 little-endian. But I can't find any for UTF-32 big-endian. I'm developing a simple textviewer and don't think there are many…

c# text encoding endianness utf-32

asked Oct 06 '15 at 15:23

Jenix

2,996
2
29
58

votes

2 answers

Reading/writing/printing UTF-8 in C++11

I have been exploring C++11's new Unicode functionality, and while other C++11 encoding questions have been very helpful, I have a question about the following code snippet from cppreference. The code writes and then immediately reads a text file…

utf-8 c++11 wchar-t utf-32 codecvt

asked Mar 18 '13 at 09:10

Ephemera

8,672
8
44
84

votes

5 answers

UTF32 and C# problems

So I've got some troubles with character encoding. When I put the following two characters into a UTF32 encoded text file: 鸕 and then run this code on them: System.IO.StreamReader streamReader = new System.IO.StreamReader("input",…

c# encoding mono gedit utf-32

asked Apr 03 '12 at 05:44

AStupidNoob

1,980
3
23
35

votes

2 answers

How to get a reliable unicode character count in Python?

Google App Engine uses Python 2.5.2, apparently with UCS4 enabled. But the GAE datastore uses UTF-8 internally. So if you store u'\ud834\udd0c' (length 2) to the datastore, when you retrieve it, you get '\U0001d10c' (length 1). I'm trying to count…

python google-app-engine unicode utf-16 utf-32

asked Aug 03 '11 at 06:26

Travis

2,961
4
22
29

votes

1 answer

Conversion from wstring to u16string and back (standard conform) in C++17 / C++20

My main platform is Windows which is the reason why I use internally UTF-16 (mostly BMP strings). I would like to use console output for these strings. Unfortunately there is no std::u16cout or std::u8cout so I need to use std::wcout. Therefore I…

c++ c++17 utf-16 wstring utf-32

asked Apr 20 '20 at 13:19

Bernd

2,113
8
22

votes

2 answers

What open source C or C++ libraries can convert arbitrary UTF-32 to NFC?

What open source C or C++ libraries can convert arbitrary UTF-32 to NFC? Libraries that I think can do this so far: ICU, Qt, GLib (not sure?). I don't need any other complex Unicode support; just conversion from arbitrary but known-correct UTF-32 to…

c++ unicode open-source utf-32

asked Nov 24 '11 at 06:35

wjl

7,519
2
32
41

votes

2 answers

How to detect unicode string width in terminal?

I'm working on a terminal based program that has unicode support. There are certain cases where I need to determine how many terminal columns a string will consume before I print it. Unfortunately some characters are 2 columns wide (chinese,…

c++ linux unicode utf-8 utf-32

asked May 23 '16 at 17:30

KyleL

1,379
2
13
35

2 3 4 5 6 7 Next