Non-latin input with wcin produces character '/0'

Asked Aug 07 '20 at 09:33

Active Aug 07 '20 at 12:17

Viewed 80 times

#include <iostream>
#include <iomanip>

constexpr int SIZE=20;

int main()
{
    wchar_t input[SIZE+1];
    std::wcin >> input;

    input[SIZE] = '\0';
    
    wchar_t c;
    for(int i=0; i<SIZE; ++i)
    {
        c=input[i];
        std::cout << std::setw(4) << std::hex << +c << ' ';
    }
}

With this code, if I enter any non-latin characters I can see 0's in their position. For example, if I enter ФФFF, I get

   0    0   46   46    0   40    0    0    8    0    0    0    c    0    0    0 13a0   d1    0    0

I'm running Windows 10, using VSC and C++11 as a compiler, and I use unicode hex character set (Cyrillic letters). If I hardcode the characters and bypass wcin, I get

  424  424   46   46    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0

edited Aug 07 '20 at 09:52

asked Aug 07 '20 at 09:33

memesaregood

This strongly depends on 1) your operating system 2) environment settings such as region and code page 3) the actual bytes you are passing. Please edit your question with all three so we can help you debug this. – Botje Aug 07 '20 at 09:36
Edited, provided all needed info. At least I think so. – memesaregood Aug 07 '20 at 09:54
Still missing 2 and 3. There are numerous ways to represent those characters, but we need to know which encoding your operating system expects and what encoding you are sending. We can deduce the latter from the exact *bytes* you are sending, the former is something you need to look up. – Botje Aug 07 '20 at 09:58
Uh, I don't really understand, how do I check that? – memesaregood Aug 07 '20 at 10:28
Show us the result of calling the [`chcp`](https://learn.microsoft.com/en-us/windows-server/administration/windows-commands/chcp) command. For the other question, enter the same string into a command like `copy con blah.txt` and tell us what bytes end up in blah.txt. – Botje Aug 07 '20 at 10:56
chcp: `Active code page: 65001`; blah.txt: FF – memesaregood Aug 07 '20 at 11:01
check the bytes with a hex editor. Need to know what bytes correspond to Ф. – Botje Aug 07 '20 at 11:04
There was no Ф in the blah, so showed up as 00 00. Ф's themselves show as D0 A4 – memesaregood Aug 07 '20 at 11:09
Okay. That means your OS/console is set up for UTF-8 and your input is also encoded as UTF-8. – Botje Aug 07 '20 at 11:19
Unfortunately getting UTF-8 working with wcin is [a bit of a mess](https://stackoverflow.com/a/48180107/1548468) – Botje Aug 07 '20 at 11:28
Just tried that piece of code, entered the same string (ФФFF), got ` $F F` – memesaregood Aug 07 '20 at 11:57
UPD: It actually works. Thanks. – memesaregood Aug 07 '20 at 12:01
Can you post an answer, so I can flag it as a solution and flag this question as solved? – memesaregood Aug 07 '20 at 12:07
Unfortunately the accepted answer on that question is different, and copy pasting the answer seems wrong to me. – Botje Aug 07 '20 at 12:08

Non-latin input with wcin produces character '/0'

0 Answers0