Questions tagged [ucs-4]

Universal Character Set-4 is a 31-bit encoding form defined by the original ISO 10646, and is largely replaced by UTF-32. It can represent up to 2,147,483,648 characters from `0x00000000` to `0x7FFFFFFF`. Use this tag when you are specifically dealing with UCS-4.

Unicode Character Set-4 is a precursor to Unicode encoding. It is a fixed-length encoding scheme of characters, where each character takes up 32 bits, or four bytes (hence the '4' part in UCS-4).

The leading sign bit is unused, leaving 31 bits used to encode each of the potential 2,147,483,648 characters that it can be encoded from 0x00000000 to 0x7FFFFFFF.

UCS-4 is now superseded by UTF-32, where each of the 1,114,112 possible Unicode code points in 17 planes of 65536 code points take up four bytes, and also, only code points 0x0000 to 0x10FFFF are considerd to be in range. The UTF-32 character encodings are almost completely identical to that used by the UCS-4. UCS-4 therefore covers all Unicode characters that can be encoded by a UTF format.

Examples of UCS-4 encodings (all of them big endian):

  • Character '0' is stored as 0x00000030, using four bytes, rather than one-byte 0x30 in ASCII or UTF-8, or two-byte 0x0030 in UTF-16.
  • Replacement character '�' is stored as 0x0000FFFD, again using four bytes, rather than three-byte 0xEF 0xBF 0xBD in UTF-8 or two-byte 0xFFFD in UTF-16.
  • Emoji '' is stored as 0x0001F606, again using four bytes, but not using surrogates 0xD83D 0xDE06 in UTF-16, or four bytes like 0xF0 0x9F 0x98 0x86 in UTF-8.
  • Code points above 0x10FFFF are not in Unicode range and are not to be used.

Related Tags:

Read More:

9 questions
3
votes
1 answer

Python unicode - UCS2 vs UCS4

I came across a scenario where I have to choose between UCS-2 and UCS-4. What is the significance of UCS-2 vs UCS-4 related to Python? How are they different?
user3571631
  • 821
  • 3
  • 8
  • 19
3
votes
1 answer

Read UTF-8 file into UCS-4 string

I am trying to read a UTF-8 encoded file into a UTF-32 (UCS-4) string. Basically internally I want a fixed size character internally to the application. Here I want to make sure the translation is done as part of the stream processes (because that…
Martin York
  • 257,169
  • 86
  • 333
  • 562
1
vote
1 answer

Build Python as UCS-4 via pyenv

I run into this issue ImportError numpy/core/multiarray.so: undefined symbol: PyUnicodeUCS2_AsASCIIString installing Python in a pyenv-virtualenv environment. In my case, it happens with the matplotlib package instead of numpy (as in the above…
Gabriel
  • 40,504
  • 73
  • 230
  • 404
1
vote
0 answers

How to make basemap in matplotlib work in python build with UCS4

I just build python with UCS-4, everything is working great! except one: basemap in matplotlib. I am getting the following error while importing Basemap: ImportError: .../python2.7/site-packages/mpl_toolkits/basemap/_proj.so: undefined symbol:…
innoSPG
  • 4,588
  • 1
  • 29
  • 42
1
vote
0 answers

ConvertUTF16toUCS4 in Apache Xerces

The source code for Apache Xerces: ConvertUTF16toUCS4 is: ConversionResult ConvertUTF16toUCS4( UTF16 **sourceStart, UTF16 *sourceEnd, UCS4 **targetStart, const UCS4 *targetEnd) { ConversionResult result = ok; register UTF16 *source =…
UnderWood
  • 803
  • 3
  • 12
  • 23
0
votes
2 answers

convert ucs-4 to ucs-2

The unicode value of ucs-4 character '' is 0001f923, it gets auto changed to the corresponding value of \uD83E\uDD23 when being copied into java code in intelliJ IDEA. Java only supports ucs-2, so there occurs a transformation from ucs-4 to ucs-2.…
wongoo
  • 566
  • 4
  • 7
0
votes
0 answers

Create Virtualenv in Ubuntu 14.04 having Python UCS4

I need to create a virtual environment on Ubuntu with Python 2.7.14 having UCS4 encoding. It seems that by default I get an UCS2. Is there a flag to pass to the virtualenv command in order to set the right encoding, or maybe it is a setting that I…
Andreampa
  • 233
  • 1
  • 3
  • 10
0
votes
0 answers

Python ucs-4/ucs-2 incompatibility

I got incompatibility issue trying to link my Python interpreter(compiled with UCS-4) with the local MPICH2 library(compiled with UCS-2). the error message is shown as below: ImportError:…
M.Gu
  • 65
  • 5
0
votes
1 answer

Embedded Cython UCS2 UCS4 issue

I have some python code I am trying to embed within c++ code using the cython api construct. For testing purposes I have been working off of: Example program of Cython as Python to C Converter With the slightly modified code: #foo.pyx cdef public…