How to calculate the length of a string by characters, not by code units (UTF-8, UTF-16)?

Asked May 24 '20 at 06:08

Active May 24 '20 at 06:08

Viewed 41 times

I have two simple examples of "UTF-8,16" to get the length of a text as follow:

// UTF-8
string str = u8"az";
cout << str.length() << endl; // The reulst: 6

// UTF-16
wstring str= L"az"; // Also "u16string"  
cout << str.length() << endl; // The reulst: 4

The length of the first example is "6" and the second one is "4" but it is supposed to be just "3" in both of them because it must deal with as characters.
I know that happened because it calculates the length of the string by "code units".

I there a way to get the correct length of "UTF-8,16" string?

asked May 24 '20 at 06:08

Lion King

32,851
25
81
143

1

Does this answer your question? [Getting the actual length of a UTF-8 encoded std::string?](https://stackoverflow.com/questions/4063146/getting-the-actual-length-of-a-utf-8-encoded-stdstring) – Jan Schultke May 24 '20 at 06:18
It depends on what you think a character is. How many characters are in this string: `a͠e`? What about this `â̠͠é̖`? – n. m. could be an AI May 24 '20 at 06:23

How to calculate the length of a string by characters, not by code units (UTF-8, UTF-16)?

0 Answers0