Questions tagged [ucs-4]

Universal Character Set-4 is a 31-bit encoding form defined by the original ISO 10646, and is largely replaced by UTF-32. It can represent up to 2,147,483,648 characters from `0x00000000` to `0x7FFFFFFF`. Use this tag when you are specifically dealing with UCS-4.

Unicode Character Set-4 is a precursor to Unicode encoding. It is a fixed-length encoding scheme of characters, where each character takes up 32 bits, or four bytes (hence the '4' part in UCS-4).

The leading sign bit is unused, leaving 31 bits used to encode each of the potential 2,147,483,648 characters that it can be encoded from 0x00000000 to 0x7FFFFFFF.

UCS-4 is now superseded by UTF-32, where each of the 1,114,112 possible Unicode code points in 17 planes of 65536 code points take up four bytes, and also, only code points 0x0000 to 0x10FFFF are considerd to be in range. The UTF-32 character encodings are almost completely identical to that used by the UCS-4. UCS-4 therefore covers all Unicode characters that can be encoded by a UTF format.

Examples of UCS-4 encodings (all of them big endian):

Character '0' is stored as 0x00000030, using four bytes, rather than one-byte 0x30 in ASCII or UTF-8, or two-byte 0x0030 in UTF-16.
Replacement character '�' is stored as 0x0000FFFD, again using four bytes, rather than three-byte 0xEF 0xBF 0xBD in UTF-8 or two-byte 0xFFFD in UTF-16.
Emoji '' is stored as 0x0001F606, again using four bytes, but not using surrogates 0xD83D 0xDE06 in UTF-16, or four bytes like 0xF0 0x9F 0x98 0x86 in UTF-8.
Code points above 0x10FFFF are not in Unicode range and are not to be used.

Related Tags:

utf-32, UCS-4's most direct successor
utf-8, utf-16, other Unicode encodings
unicode, ucs
ucs2, where each of the 65536 characters take up two bytes

Read More:

9 questions

votes

1 answer

Python unicode - UCS2 vs UCS4

I came across a scenario where I have to choose between UCS-2 and UCS-4. What is the significance of UCS-2 vs UCS-4 related to Python? How are they different?

python-2.7 ucs2 ucs-4

asked Mar 29 '17 at 22:28

user3571631

votes

1 answer

Read UTF-8 file into UCS-4 string

I am trying to read a UTF-8 encoded file into a UTF-32 (UCS-4) string. Basically internally I want a fixed size character internally to the application. Here I want to make sure the translation is done as part of the stream processes (because that…

c++ utf-8 ucs-4

asked Jan 27 '16 at 03:29

Martin York

257,169
86
333
562

vote

1 answer

Build Python as UCS-4 via pyenv

I run into this issue ImportError numpy/core/multiarray.so: undefined symbol: PyUnicodeUCS2_AsASCIIString installing Python in a pyenv-virtualenv environment. In my case, it happens with the matplotlib package instead of numpy (as in the above…

python ucs2 pyenv ucs ucs-4

asked Aug 13 '16 at 03:37

Gabriel

40,504
73
230
404

vote

0 answers

How to make basemap in matplotlib work in python build with UCS4

I just build python with UCS-4, everything is working great! except one: basemap in matplotlib. I am getting the following error while importing Basemap: ImportError: .../python2.7/site-packages/mpl_toolkits/basemap/_proj.so: undefined symbol:…

python matplotlib matplotlib-basemap ucs-4

asked Mar 16 '16 at 16:26

innoSPG

4,588
1
29
42

vote

0 answers

ConvertUTF16toUCS4 in Apache Xerces

The source code for Apache Xerces: ConvertUTF16toUCS4 is: ConversionResult ConvertUTF16toUCS4( UTF16 **sourceStart, UTF16 *sourceEnd, UCS4 **targetStart, const UCS4 *targetEnd) { ConversionResult result = ok; register UTF16 *source =…

c++ apache utf-16 xerces ucs-4

asked Jul 02 '15 at 07:08

UnderWood

votes

2 answers

convert ucs-4 to ucs-2

The unicode value of ucs-4 character '' is 0001f923, it gets auto changed to the corresponding value of \uD83E\uDD23 when being copied into java code in intelliJ IDEA. Java only supports ucs-2, so there occurs a transformation from ucs-4 to ucs-2.…

java ucs2 ucs-4

asked Sep 16 '19 at 09:37

wongoo

votes

0 answers

Create Virtualenv in Ubuntu 14.04 having Python UCS4

I need to create a virtual environment on Ubuntu with Python 2.7.14 having UCS4 encoding. It seems that by default I get an UCS2. Is there a flag to pass to the virtualenv command in order to set the right encoding, or maybe it is a setting that I…

python ubuntu virtualenv ucs ucs-4

asked Oct 16 '18 at 14:55

Andreampa

votes

0 answers

Python ucs-4/ucs-2 incompatibility

I got incompatibility issue trying to link my Python interpreter(compiled with UCS-4) with the local MPICH2 library(compiled with UCS-2). the error message is shown as below: ImportError:…

python python-2.7 mpi4py ucs2 ucs-4

asked Aug 08 '16 at 21:03

M.Gu

votes

1 answer

Embedded Cython UCS2 UCS4 issue

I have some python code I am trying to embed within c++ code using the cython api construct. For testing purposes I have been working off of: Example program of Cython as Python to C Converter With the slightly modified code: #foo.pyx cdef public…

python c++ cython ucs2 ucs-4

asked Aug 03 '16 at 21:45

someoneGeorge