0

Learning c++, trying to find a way to display UTF-16 characters by adding the 4 digits after the "\u". But, for example, if I try to directly add 0000:

string temp = "\u" + "0000";

I get the error: incorrectly formed universal character name. So is there a way to get these two to form one Unicode character? Also I realize that the end four numbers range from 0-F but for now I just want to focus on the 0-9 characters.

How can I add"\u" with a different string

Edit: I was looking for the C++ equivalent of the JavaScript function:

String.fromCharCode()
Kuang
  • 57
  • 2
  • 7
  • 5
    Why are you breaking them up into two strings? How much do you understand about programming? The compiler translates special sequences between quotes into Unicode characters (obviously at compile time). What are you really trying to accomplish? – NetMage Dec 20 '17 at 23:36
  • 2
    "*trying to find a way to display UTF-16 characters*" - then use a UTF-16 string type, like `std::wstring` on Windows, or `std::u16string` in C++11 and later. And use an API that can display Unicode strings. What you are doing is not an even remotely correct way to handle Unicode strings in C++ – Remy Lebeau Dec 20 '17 at 23:45
  • I was trying to create a function to return a character with code \uXXXX by passing in XXXX. I just needed a way to parse \u with XXXX to do this – Kuang Dec 20 '17 at 23:54
  • What encoding do you want it added in? `UTF8`? – Galik Dec 20 '17 at 23:59
  • `\unnnn` (where `n` = `0-9A-F`) is a convention to add unicode code points to a string. The *format* those code points take depends on their *encoding*. What *encoding* do you want? `UTF-8`? `UTF-16`? `UTF-32`? – Galik Dec 21 '17 at 00:09
  • Preferably UTF-16 – Kuang Dec 21 '17 at 00:34
  • Maybe there is a solution: https://stackoverflow.com/questions/12015571/how-to-print-unicode-character-in-c/ – Iro Jun 25 '19 at 19:50

3 Answers3

2

You can't say "\u" + "0000", because the parsing of escape sequences happens early in the process, before the actual compilation begins. By the time the strings would be tacked together, escape sequences are already parsed and won't be again. And since \u is not a valid escape sequence on its own, you get an error about it.

cHao
  • 84,970
  • 20
  • 145
  • 172
2

You can't separate a string literal like that. The special sequence inside the quotes is a directive to the compiler to insert the relevant Unicode character at compile time so if you break it into two pieces it is no longer recognized as a directive.

To programatically generate a UTF-16 character based on its Unicode codepoint number you could use the Standard Library Unicode converson functions. Unfortunately there is no direct conversion between UTF-32 (Unicode codepoints) and UTF-16 so you have to go through UTF-8 as an intermediate value:

// UTF-16 may contain either one or two char16_t characters so
// we return a string to potentially contain both.
///
std::u16string codepoint_to_utf16(char32_t cp)
{
    // convert UTF-32 (standard unicode codepoint) to UTF-8 intermediate value
    char utf8[4];
    char* end_of_utf8;

    {
        char32_t const* from = &cp;

        std::mbstate_t mbs;
        std::codecvt_utf8<char32_t> ccv;

        if(ccv.out(mbs, from, from + 1, from, utf8, utf8 + 4, end_of_utf8))
            throw std::runtime_error("bad conversion");
    }

    // Now convert the UTF-8 intermediate value to UTF-16

    char16_t utf16[2];
    char16_t* end_of_utf16;

    {
        char const* from = nullptr;

        std::mbstate_t mbs;
        std::codecvt_utf8_utf16<char16_t> ccv;

        if(ccv.in(mbs, utf8, end_of_utf8, from, utf16, utf16 + 2, end_of_utf16))
            throw std::runtime_error("bad conversion");
    }

    return {utf16, end_of_utf16};
}

int main()
{
    std::u16string s; // can hold UTF-16

    // iterate through some Greek codepoint values
    for(char32_t u = 0x03b1; u < 0x03c9; ++u)
    {
        // append the converted UTF-16 characters to our string
        s += codepoint_to_utf16(u);
    }

    //  do whatever you want with s here...    
}
Galik
  • 47,303
  • 4
  • 80
  • 117
1

What you're trying to do is not possible. C++ parsing is split into multiple phases. Per [lex.phases], escape sequences (in phase 5) are escaped before adjacent string literals are concatenated (phase 6).