0

Certainly, my problem is not new...., so I apologize if my error is simply too stupid.

I just wanted to become familiar with putwchar and simply wrote the following little piece of code:

#include <stdio.h>
#include <wchar.h>
#include <locale.h>

int main(void)
{
  char *locale = setlocale(LC_ALL, "");
  printf ("Locale: %s\n", locale);

  //setlocale(LC_CTYPE, "de_DE.utf8");
  
  wchar_t hello[]=L"Registered Trademark: ®®\nEuro sign: €€\nBritisch Pound: ££\nYen: ¥¥\nGerman Umlauts: äöüßÄÖÜ\n";

  int index = 0;
  while (hello[index]!=L'\0'){
  //printf("put liefert: %d\n", putwchar(hello[index++]));
    putwchar(hello[index++]);
  };
}

Now. the output is simply:

Locale: de_DE.UTF-8
Registered Trademark: ��
Euro sign: ��
Britisch Pound: ��
Yen: ��
German Umlauts: �������
\[1\]+  Fertig                  gedit versuch.c

None of the non-ASCII chars appeared on the screen.

As you see in the comment (and I well noticed that I must not mix putwchar and print in the same program, hence the line is in comment, putwchar returned the proper Unicode codepoint for the character I wanted to print. Thus, the call is supposed to work. (At least to my understanding.) The c source is coded in utf-8

$ file versuch.c
versuch.c: C source, UTF-8 Unicode text

my system is Ubuntu Linux 20.04.05
compiler: gcc version 9.4.0 (Ubuntu 9.4.0-1ubuntu1~20.04.1)

I would greatly appreciate any advice on this one.

As stated above: I simply expected the trademark sign, yen, € and the umlauts äöüßÄÖÜ to appear.

Toby Speight
  • 27,591
  • 48
  • 66
  • 103
  • What terminal are you using to view your program's output? Does it expect UTF-8 encoding, and is it using a font that supports those characters? – Brian61354270 Jan 15 '23 at 16:53
  • 1
    I'm guessing you have edited and saved the source code as UTF-8, so the special characters in the string are not actually valid `wchar_t` characters, but rather UTF-8 byte sequences. – Some programmer dude Jan 15 '23 at 16:55
  • @Someprogrammerdude shouldn't the compiler recognize those sequences and convert them at compile time? – Mark Ransom Jan 15 '23 at 20:54
  • @Brian: yes, my terminal expects utf8 characters and shows them correctly. And yes, I stored the program source code in utf8. To my understanding (and this is admittedly tentative) the "wide characters" are stored as unicode characters in the binary code, particularly as I set an "L" prefix right before the string constant to indicate "wide characters". I think, that is what Mark Ransom wanted to indicate. – Detlef Bosau Jan 15 '23 at 22:11
  • @MarkRansom No there are no such requirements. The compiler goes through string literals to replace escape-sequences, but nothing more. – Some programmer dude Jan 16 '23 at 05:20
  • @Someprogrammerdude there's not much point to a wide string if it's only going to contain single-byte characters. The standard might not require it, but a good compiler would do it anyway. – Mark Ransom Jan 16 '23 at 13:49
  • @Someprogrammerdude No they are not UTF-8 byte sequences in any implementation known to mankind. – n. m. could be an AI Jan 16 '23 at 17:50

3 Answers3

1

You cannot mix narrow and wide I/O in the same stream (7.21.2). If you want putwchar, you cannot use printf. Start with wprintf instead (with the wide format string):

wprintf (L"Locale: %s\n", locale);
n. m. could be an AI
  • 112,515
  • 14
  • 128
  • 243
1

You shouldn't mix normal and wide output on the same stream.

I get the expected output if I change this early print:

  printf ("Locale: %s\n", locale);

into a wide print:

    wprintf(L"Locale: %s\n", locale);

Then the subsequent putwchar() calls write the expected characters.

Toby Speight
  • 27,591
  • 48
  • 66
  • 103
0

You can simply print those wide characters as shown below:

wprintf(L"Registered Trade Mark: %ls\n", L"®®");
wprintf(L"Euro Sign: %ls\n", L"€€");
wprintf(L"British Pound: %ls\n", L"££");
wprintf(L"Yen: %ls\n", L"¥¥");
wprintf(L"German Umlauts: %ls\n", L"äöüßÄÖÜ");

Please refer:

Gaurav Pathak
  • 1,065
  • 11
  • 28
  • I simply used these characters as examples for international characters which do not depend on a certain language, other than e.g. German umlauts äöü. – Detlef Bosau Jan 15 '23 at 22:17
  • Does it mean that the `wchar` that you want to display on terminal can be anything? What is the source of input of these `wchar` that you want to display? Are they coming from any file or `stdin` or from any other source? – Gaurav Pathak Jan 16 '23 at 05:18
  • But you didn't mention this in your original question, that you want to read the data from stdin neither your posted code shows it. In that case the answer that I posted is not relevant – Gaurav Pathak Jan 16 '23 at 08:16
  • Yes To make it mpre clear: To my understanding, the functions getwchar and putwchar are corresponting. getchar reads a character from stdin and yiels a wchar to the program, putwchar works the other way rount. To my understanding, wchar is the character's internal represantation, i.e. unicode in an unsigend 32 bit variable and this is converted from or to the external represantation according to the locale. And I expect the compiler to do the same conversion when I define an wchar[] string with a leading L, e.g. wchar_t umlauts[]=L"äöüÄÖÜß". – Detlef Bosau Jan 16 '23 at 08:19
  • @DetlefBosau That can be done, considering you've declared `wchar_t umlauts[]=L"äöüÄÖÜß";` you can use `wprintf(L"German Umlauts: %ls\n", umlauts);` to output it on terminal. But please don't use `printf()` and `wprintf()` in the same code, please refer https://stackoverflow.com/questions/8681623/printf-and-wprintf-in-single-c-code – Gaurav Pathak Jan 16 '23 at 11:43
  • Patahik: Please pay your consideration to the comments in my code. Meanwhile, I understood that bye oriented functions and character oriented functions must not be mixed on the same stream. I only used the printf commands experimental in order to see the return code yielded by putwchar. However, I'm too much focussed on unicode here. I think the idea behind wide chars is simply to offer a 32bit alternative for the "old" 8 bit chars. And the semantics is left to the user. And particularly for utf 8, getwchar and putwchar care for the marshalling/unmarhalling, is this correct? – Detlef Bosau Jan 16 '23 at 17:06
  • and for putwchar(): shouldn't this work here, assumed that i DO NOT USE printf on the same stream? – Detlef Bosau Jan 16 '23 at 17:12
  • @DetlefBosau If you have more questions, please hit the "Ask a question" button and ask them (one question per question). Comments are not meant for this. – n. m. could be an AI Jan 16 '23 at 17:54
  • I missed the printf(locale..... I changed it in the suggested way and now it works. and I see the problem. I confused the outputstream by the wrongly used printf, many thanks for this help. – Detlef Bosau Jan 16 '23 at 22:43