Questions tagged [ucs2]

Universal Character Set-2 is an early version of Unicode that has been superseded by the Unicode UTF-16 standard

UCS-2 is limited to 65.535 characters and produces a fixed-length format by simply using the code point as the 16-bit code unit

In UCS-2 each character is represented by 16 bits or 2 bytes. (The number 2 in UCS-2 indicates 2 bytes.)

For example:

Uppercase A is represented by 0041. This encoding is no longer sufficient and has been superseded by the UTF-16 encoding.

UCS-2 was superseded by UTF-16 in version 2.0 of the Unicode standard in July 1996.

Read more

121 questions
64
votes
7 answers

How to find out if Python is compiled with UCS-2 or UCS-4?

Just what the title says. $ ./configure --help | grep -i ucs --enable-unicode[=ucs[24]] Searching the official documentation, I found this: sys.maxunicode: An integer giving the largest supported code point for a Unicode character. The value…
Sridhar Ratnakumar
  • 81,433
  • 63
  • 146
  • 187
27
votes
4 answers

What version of Unicode is supported by which .NET platform and on which version of Windows in regards to character classes?

Updated question ¹ With regards to character classes, comparison, sorting, normalization and collations, what Unicode version or versions are supported by which .NET platforms? Original question I remember somewhat vaguely having read that .NET…
Abel
  • 56,041
  • 24
  • 146
  • 247
15
votes
1 answer

Python 3: reading UCS-2 (BE) file

I can't seem to be able to decode UCS-2 BE files (legacy stuff) under Python 3.3, using the built-in open() function (stack trace shows UnicodeDecodeError and contains my readLine() method) - in fact, I wasn't able to find a flag for specifying this…
elder elder
  • 645
  • 1
  • 9
  • 23
13
votes
3 answers

How to convert a Unicode text-block to UTF-8 (HEX) code point?

I have a Unicode text-block, like this: ụ ư ứ Ỳ Ỷ Ỵ Đ Now, I want to convert this orginal Unicode text-block into a text-block of UTF-8 (HEX) code point (see the Hexadecimal UTF-8 column, on this page: https://en.wikipedia.org/wiki/UTF-8), by PHP;…
user5132285
12
votes
1 answer

What is the maximum number of characters in an USSD message?

I've understood that an USSD message consists of 160 bytes. For 7 bit data coding schemes, the maximum number of characters is 160*8/7 which gives 182 characters. It's unclear to me what is the maximum number of characters for UCS2 encoding.…
Victor Ionescu
  • 1,967
  • 2
  • 21
  • 24
12
votes
1 answer

python base64 string decoding

I've got what's supposed to be a UCS-2 encoded xml document that I've managed to build a DOM based on minidom after some tweaking. The issue is that I'm supposed to have some data encoded on base64. I know for a fact that: AME= (or…
bleeding edge
  • 123
  • 1
  • 1
  • 4
12
votes
8 answers

C++ strings: UTF-8 or 16-bit encoding?

I'm still trying to decide whether my (home) project should use UTF-8 strings (implemented in terms of std::string with additional UTF-8-specific functions when necessary) or some 16-bit string (implemented as std::wstring). The project is a…
Carl Seleborg
  • 13,125
  • 11
  • 58
  • 70
11
votes
3 answers

'UCS-2' codec can't encode characters in position 1050-1050

When I run my Python code, I get the following errors: File "E:\python343\crawler.py", line 31, in print (x1) File "E:\python343\lib\idlelib\PyShell.py", line 1347, in write return self.shell.write(s,…
Andi
  • 133
  • 1
  • 1
  • 13
10
votes
3 answers

best way to detect number of SMS needed to send a text

I'm looking for a code/lib in php that I will call it and pass a text to it and it will tell me: What is the encode I need to use in order to send this text as SMS (7,8,16 bit) How many SMS message I will use to send this text (it must be smart to…
AFT
  • 45
  • 7
  • 22
9
votes
2 answers

What are the consequences of storing a C# string (UTF-16) in a SQL Server nvarchar (UCS-2) column?

It seems that SQL Server uses Unicode UCS-2, a 2-byte fixed-length character encoding, for nchar/nvarchar fields. Meanwhile, C# uses Unicode UTF-16 encoding for its strings (note: Some people don't consider UCS-2 to be Unicode, but it encodes all…
Triynko
  • 18,766
  • 21
  • 107
  • 173
9
votes
1 answer

R: can't read unicode text files even when specifying the encoding

I'm using R 3.1.1 on Windows 7 32bits. I'm having a lot of problems reading some text files on which I want to perform textual analysis. According to Notepad++, the files are encoded with "UCS-2 Little Endian". (grepWin, a tool whose name says it…
s_a
  • 885
  • 3
  • 9
  • 22
8
votes
1 answer

Change encoding (collation?) of SQL Server 2008 R2 to UTF-8

We'd like to move our Confluence system to a SQL Server 2008 R2. Now, since Confluence uses UTF-8 encoding, I'd need a database using the same encoding (I guess that's the collation?). There's the command alter database confluence set collation…
Ahatius
  • 4,777
  • 11
  • 49
  • 79
7
votes
3 answers

Storing UTF-16/Unicode data in SQL Server

According to this, SQL Server 2K5 uses UCS-2 internally. It can store UTF-16 data in UCS-2 (with appropriate data types, nchar etc), however if there is a supplementary character this is stored as 2 UCS-2 characters. This brings the obvious issues…
David Cameron
6
votes
2 answers

UCS-2 and SQL Server

While researching options for storing mostly-English-but-sometimes-not data in a SQL Server database that can potentially be quite large, I'm leaning toward storing most string data as UTF-8 encoded. However, Microsoft chose UCS-2 for reasons that I…
Eric J.
  • 147,927
  • 63
  • 340
  • 553
5
votes
4 answers

2-byte (UCS-2) wide strings under GCC

when porting my Visual C++ project to GCC, I found out that the wchar_t datatype is 4-byte UTF-32 by default. I could override that with a compiler option, but then the whole wcs* (wcslen, wcscmp, etc.) part of RTL is rendered unusable, since it…
Seva Alekseyev
  • 59,826
  • 25
  • 160
  • 281
1
2 3
8 9