How to universally define the number of characters per line, regardless of the encoding?

Question

Interested question correct processing of multi languages

// C ++. How to universally define the number of
// characters per line, regardless of the encoding?
#include <iostream>
#include <string>
using namespace std;

int main()
{
    // test line 13 characters length
    // but the result get is 19 characters
    string test_string = "string_строка";

    cout << "String length " << test_string.size() << " characters.\n";

    return 0;
}

I think that this is due to the different number of allocated memory for the characters of the Latin alphabet and the Cyrillic.
How to solve this is universal? Or simpy for Cyrillic. My system Ubuntu 14.04 (Unity). Compiler GCC 4.9.1 20140922 (Red Hat 4.9.1-10), 64 bit.

An solution without knowing the encoding and its quirks is just impossible. — deviantfan, Apr 30 '16 at 08:44
I guess your string is UTF8 (to check, print the byte values). In this case, first decide if you want code points or glyphs. — deviantfan, Apr 30 '16 at 08:45
@bkVnet wstring ist just an array of 2- or 4-byte tupels instead of 1-byte ones, It doesn't change anything of the problem (encoding, Unicode principles, etc.) — deviantfan, Apr 30 '16 at 08:46
@Did_Mazay Glyphs are what you see. Code points are entries in the Unicode "table". In Unicode, sadly, this are not the same things like they are in ASCII etc. — deviantfan, Apr 30 '16 at 08:49
What works? ... Don't know what you did now, but if this is more than a small school assignment, you're probably missing something — deviantfan, Apr 30 '16 at 08:49
you need to convert utf8 to utf16 to get exact length have a look at http://stackoverflow.com/questions/18921979/how-to-convert-utf-8-encoded-stdstring-to-utf-16-stdstring?rq=1 — piyushj, Apr 30 '16 at 08:50
@piyushjaiswal For your stirng, this may work, but in the general case, this is completely wrong. — deviantfan, Apr 30 '16 at 08:51
@Did_Mazay Well, I just can suggest you to learn more about charsets. Else, your program may work for 98% of the cases, but not the rest. — deviantfan, Apr 30 '16 at 08:53
@Did_Mazay Yeah, that's what I feared. You know, such errors can kill people (and they already did). — deviantfan, Apr 30 '16 at 08:56
@deviantfan, but now it really worked. This is not a universal solution? — Yurii Holskyi, Apr 30 '16 at 08:57
@Did_Mazay Did you even read my comments so far? Yes, it will work for this string, and for 98% or 99% of all other strings your program can get, but not for 100%. => It's *not* an universal solution. And the universal solution is much, much more complex. — deviantfan, Apr 30 '16 at 08:58
@deviantfan, maybe delete my question better if it is incorrect? — Yurii Holskyi, Apr 30 '16 at 09:00
@Did_Mazay Your question is not incorrect. And being a beginner is fine too. But "it works for this input, ok, I'm done" won't go well, both with charsets and C++. Before you start writing programs that are actually used by other people, here some keywords that you should understand (really understand): Byte/Codepoint/Glyph, Charset/Encoding, UTF16 surrogates, BOM, Endianess, collation, and most complex but important: Unicode normalization. — deviantfan, Apr 30 '16 at 09:32

How to universally define the number of characters per line, regardless of the encoding?

0 Answers0