Questions tagged [unicode-string]

Use this tag for questions related to a Unicode String, such as UTF-8.

Quoting this answer:

Unicode is a standard for working with a wide range of characters. Each symbol has a codepoint (a number), and these codepoints can be encoded (converted to a sequence of bytes) using a variety of encodings.

Notice that can be used with any programming environment that uses Unicode Strings. You ought to tag the question with the tag of your programming environment as well though.

647 questions
149
votes
8 answers

Why is the length of this string longer than the number of characters in it?

This code: string a = "abc"; string b = "AC"; Console.WriteLine("Length a = {0}", a.Length); Console.WriteLine("Length b = {0}", b.Length); outputs: Length a = 3 Length b = 4 Why? The only thing I could imagine is that the Chinese character is 2…
weini37
  • 1,455
  • 3
  • 10
  • 9
93
votes
8 answers

How I can print the wchar_t values to console?

Example: #include using namespace std; int main() { wchar_t en[] = L"Hello"; wchar_t ru[] = L"Привет"; //Russian language cout << ru << endl << en; return 0; } This code only prints HEX-values like…
zed91
  • 1,073
  • 1
  • 8
  • 10
64
votes
9 answers

What is the range of Unicode Printable Characters?

Can anybody please tell me what is the range of Unicode printable characters? [e.g. Ascii printable character range is \u0020 - \u007f]
Anindya Chatterjee
  • 5,824
  • 13
  • 58
  • 82
57
votes
5 answers

Java Unicode String length

I am trying hard to get the count of unicode string and tried various options. Looks like a small problem but struck in a big way. Here I am trying to get the length of the string str1. I am getting it as 6. But actually it is 3. moving the cursor…
user1611248
  • 708
  • 3
  • 7
  • 13
47
votes
3 answers

Convert between string, u16string & u32string

I've been looking for a way to convert between the Unicode string types and came across this method. Not only do I not completely understand the method (there are no comments) but also the article implies that in future there will be better…
DrYap
  • 6,525
  • 2
  • 31
  • 54
45
votes
4 answers

Python 3: os.walk() file paths UnicodeEncodeError: 'utf-8' codec can't encode: surrogates not allowed

This code: for root, dirs, files in os.walk('.'): print(root) Gives me this error: UnicodeEncodeError: 'utf-8' codec can't encode character '\udcc3' in position 27: surrogates not allowed How do I walk through a file tree without getting toxic…
Collin Anderson
  • 14,787
  • 6
  • 68
  • 57
24
votes
2 answers

Unicode file in notepad

What does it mean when I save a text file as "Unicode" in notepad? is it Utf-8, Utf-16 or Utf-32? Thanks in advance.
FSm
  • 2,017
  • 7
  • 29
  • 55
24
votes
7 answers

Return code point of characters in C#

How can I return the Unicode Code Point of a character? For example, if the input is "A", then the output should be "U+0041". Ideally, a solution should take care of surrogate pairs. With code point I mean the actual code point according to Unicode,…
FSm
  • 2,017
  • 7
  • 29
  • 55
20
votes
1 answer

Python-3 and \x Vs \u Vs \U in string encoding and why

Why do we have different byte oriented string representations in Python 3? Won't it be enough to have single representation instead of multiple? For ASCII range number printing a string shows a sequence starting with \x: In [56]: chr(128) Out[56]:…
MaNKuR
  • 2,578
  • 1
  • 19
  • 31
16
votes
2 answers

iOS Localization: Unicode character escape sequences, which have the form '\uxxxx' does not work

We have key-value pair in Localization.string file. "spanish-key" = "Espa\u00f1ol"; When we fetch and assign to label then app displays it as "Espau00f1ol". Doesn't work. self.label1.text= NSLocalizedString(@"spanish-key", nil); It works- shows in…
Gaurav Borole
  • 796
  • 2
  • 13
  • 32
15
votes
6 answers

How to work with unicode in Python

I am trying to clean all of the HTML out of a string so the final output is a text file. I have some some research on the various 'converters' and am starting to lean towards creating my own dictionary for the entities and symbols and running a…
PyNEwbie
  • 4,882
  • 4
  • 38
  • 86
15
votes
3 answers

UTF-16 string terminator

What is the string terminator sequence for a UTF-16 string? EDIT: Let me rephrase the question in an attempt to clarify. How's does the call to wcslen() work?
Ray
  • 153
  • 1
  • 1
  • 4
15
votes
2 answers

Is it possible to write a Swift function that replaces only part of an extended grapheme cluster like ‍‍‍?

I want to write a function that could be used like this: let ‍‍‍ = "‍‍‍".replacingFirstOccurrence(of: "", with: "") Given how odd both this string and Swift's String library are, is this possible in Swift?
Ky -
  • 30,724
  • 51
  • 192
  • 308
15
votes
3 answers

PHP - length of string containing emojis/special chars

I'm building an API for a mobile application and I seem to have a problem with counting the length of a string containing emojis. My code: $str = "✌️ @mention"; printf("strlen: %d" . PHP_EOL, strlen($str)); printf("mb_strlen UTF-8: %d" . PHP_EOL,…
gabo
  • 1,538
  • 14
  • 15
14
votes
4 answers

PDO and UTF-8 special characters in PHP / MySQL?

I am using MySQL and PHP 5.3 and tried this code. $dbhost = 'localhost'; $dbuser = 'root'; $dbpass = ''; $con = mysql_connect("localhost", "root", ""); mysql_set_charset('utf8'); if (!$con) { die('Could not connect: ' .…
sophie
  • 1,523
  • 6
  • 18
  • 31
1
2 3
43 44