4

I need to make a program that reads and writes Greek letters. Since Greek is not in ASCII, I set the console locale to UTF-8. I managed to make some strings work in this multibyte mess, until i got to the part where i need user input

So,ill only post the problematic part

while(1) {
    char inputc[50]; 
    memset(inputc,'\0',50);
    scanf("%s",inputc)
    printf("%s",inputc);
}

With any greek character, this will print something different than the input. Also, if I do printf("%d",inputc[i]); for each element to get the int value (says negative for greek letters), I get a different value than reading the same character from a literal.

MD XF
  • 7,860
  • 7
  • 40
  • 71
dac1n
  • 309
  • 2
  • 11
  • I can only help you by upvoting your question... good luck. – Jean-François Fabre Dec 13 '16 at 21:30
  • Rather than describe the input and output seen, post the exact input used and the results. Also post the code that does the "set the console locale to utf8" and show that the change itself did not error. – chux - Reinstate Monica Dec 13 '16 at 21:30
  • for example,input of Α (greek A capital) returns � and the command used is SetConsoleOutputCP(CP_UTF8); – dac1n Dec 13 '16 at 21:36
  • I have not read [this answer](http://stackoverflow.com/a/15528399/4142924) in detail but is it any use to you? – Weather Vane Dec 13 '16 at 21:41
  • im sorry but no.I added locale and other stuff.Still either the console prints nothing,or weird characters – dac1n Dec 13 '16 at 22:07
  • Code that showed `if(SetConsoleOutputCP(CP_UTF8) == 0) { exit(-1); }` would be stronger evidence that that part of code succeeded than simply describing what code was used.. AFAIK, the problem is there. Posting a complete minimal code would be very useful. You were on a good track with `printf("%d",inputc[i]);`, but again, you have not posted those values. – chux - Reinstate Monica Dec 13 '16 at 22:16
  • i don't understand what you want me to post.I made a new program which is literally these lines there,with SetConsoleOutputCP(CP_UTF8) before that.That's the entire main. – dac1n Dec 13 '16 at 22:21
  • What was the result value of `SetConsoleOutputCP(CP_UTF8)`? [Ref](https://msdn.microsoft.com/en-us/library/windows/desktop/ms686036(v=vs.85).aspx) – chux - Reinstate Monica Dec 13 '16 at 22:22
  • Posting the results of `printf("%d",inputc[i]);` is useful. So far, it is know that there are negative for greek and different than expected. Post how code determined the value form the literal? By posting the values, it would be easier to get to the root of the problem. My suspicion is that these values are the same, it is just how you displayed/determined the values that is amiss. – chux - Reinstate Monica Dec 13 '16 at 22:29
  • it returns 1.If it's not much trouble,could you add a greek keyboard in 10 secs and test the code i posted.And see if it prints what you typed – dac1n Dec 13 '16 at 22:30
  • after passing the input to the array,i printed each elements with printf("%d",inputc[i]).The results for typing ΑΒΓΔ (the first four greek letters) were -128-127-126-125 – dac1n Dec 13 '16 at 22:34
  • if its useful,if i print greek Α in a txt file instead and read it,i get the code -50,which is different than -128 – dac1n Dec 13 '16 at 22:37

1 Answers1

2

the command used is SetConsoleOutputCP(CP_UTF8);

That only affects stdout (printf). To set stdin (scanf) as well you would need to also SetConsoleCP(CP_UTF8). If you set one but not the other then the input and output characters will naturally differ.

However, please be aware that there are serious bugs in the Windows Console when set to code page 65001/CP_UTF8 (or generally any multi-byte code page that doesn't have special support, ie those that aren't legacy locale-default code pages). Windows reports byte counts incorrectly in this state, leading to print calls that mangle and repeat output, and scan calls that hang. This is not generally a feasible way of getting Windows programs to talk Unicode.

bobince
  • 528,062
  • 107
  • 651
  • 834
  • I also tried SetConsoleCP ,and setlocale.If i use both of the consoleCP commands,i can print a greek literal but greek input is broken.If i remove them,i can input greek but it breaks inside the program I don't know what else to do,i have literally tried everything.In ubuntu it really works great without any extra command – dac1n Dec 14 '16 at 11:32
  • 1
    Well, yeah, Windows Console is broken. It can't Unicode in any POSIX-compatible way. If you really *have* to make Unicode IO look right on a Windows console you have no option but to detect that you're running on Windows and talking to a console, and switch code paths. For the Windows path the ways that work are (a) using `_setmode` with `_O_UTF8` or `_O_UTF16` and then *only* ever using the wide-character interfaces on those streams else it blows up. Or (b) calling the Win32 ReadConsoleW/WriteConsoleW APIs. – bobince Dec 14 '16 at 11:41
  • thank you very much for the answer.Im surprised to see ubuntu working out of the box,probably a more programming oriented terminal than windows cmd – dac1n Dec 14 '16 at 12:51
  • There are tons of questions on StackOverflow related to how to use Unicode with Windows console input/output. – Remy Lebeau Dec 14 '16 at 22:21