2

1 It's really strange that wprintf show 'Ω' as 3A9 (UTF16), but wctomb convert wchar to CEA9 (UTF8), my locale is default en_US.utf8. As man-pages said, they should comform to my locale, but wpritnf use UTF16, why?

excerpt from http://www.fileformat.info/info/unicode/char/3a9/index.htm

Ω in UTF

UTF-8 (hex) 0xCE 0xA9 (cea9)

UTF-16 (hex) 0x03A9 (03a9)

2 wprintf and printf just cannot be run in the same program, I have to choose to use either wprintf or printf, why?


See my program:

#include <stdio.h>
#include <wchar.h>
#include <stdlib.h>
#include <locale.h>

int main() {
  setlocale(LC_ALL,""); // inherit locale setting from environment
  int r;
  char wc_char[4] = {0,0,0,0};
  wchar_t myChar1 = L'Ω'; //greek 

  // should comment out either wprintf or printf, they don't run together
  r = wprintf(L"char is %lc (%x)\n", myChar1, myChar1);//On Linux, to UTF16

  r = wctomb(wc_char, myChar1); // On Linux, to UTF8
  r = printf("r:%d, %x, %x, %x, %x\n", r, wc_char[0], wc_char[1], wc_char[2], wc_char[3]);
}
davy
  • 123
  • 1
  • 3
  • 8
  • I'm not sure what you're asking, but I can tell you UTF-16 is never used in `char` or `wchar_t` on Linux. (And it can't be used on any conformant C implementation.) – R.. GitHub STOP HELPING ICE Oct 09 '11 at 01:22
  • If you run program, wprintf ("%x", myChar1); prints 3a9(Ω in UTF16) but not cea9(Ω in UTF8) – davy Oct 09 '11 at 02:15
  • From what I know `wchar_t` is 32-bits in Linux. So as R.. said, it isn't UTF-16. I think the locale only affects the non-wide character functions. (some please correct me if I'm wrong) – Mysticial Oct 09 '11 at 02:27
  • 1
    @Mysticial: Other way around. The non-wide functions are purely byte copying, except for `%ls` and `%lc` with `printf` and `scanf`. The wide functions convert all the wide characters they output to the locale's encoding. – R.. GitHub STOP HELPING ICE Oct 09 '11 at 04:12
  • @R..: Thanks, that's good to know. (I obviously don't change my locale very often... XD) – Mysticial Oct 09 '11 at 04:15
  • If you want to know why `printf()` and `wprintf()` can't be tun together, please read my answer in another thread: https://stackoverflow.com/questions/17700797/printf-wprintf-s-s-ls-char-and-wchar-errors-not-announced-by-a-compil#answer-60087756 – 71GA Feb 06 '20 at 08:27

3 Answers3

6

The answer to your second question has to do with stream orientation. You cannot mix printf() and wprintf() because they require different orientations.

When the process starts, the streams are not set yet. On the first call to a function that uses the stream, it gets set accordingly. printf() will set the orientation to normal, and wprintf() will set it to wide.

It is undefined behavior to call a function that requires a different orientation as the current setting.

Mysticial
  • 464,885
  • 45
  • 335
  • 332
2

How exactly are you determining what the wprintf line is printing? Your comment below the question seems to imply that you're just examining the results of wprintf ("%x", myChar1);, which prints the internal numeric value of myChar1 regardless of character encoding (but not regardless of character set — there's a difference); assuming that your compiler uses Unicode for wchar_ts internally (a pretty safe bet, I believe), this simply prints out the Unicode codepoint for 'Ω', which is 0x3a9, independently of UTF-16 vs. UTF-8 distinctions. In order to tell whether wprintf is printing UTF-16, you have to directly examine the raw bytes that are output (e.g., with hexdump(1)). For example, on my computer, the wprintf line prints the following:

63 68 61 72 20 69 73 20 ce a9 20 28 33 61 39 29 0a
c  h  a  r     i  s     Ω        (  3  a  9  )  \n

Note that the omega is encoded in UTF-8 as the bytes CE A9, but the numeric value of the wchar_t is still 3A9.

jwodder
  • 54,758
  • 12
  • 108
  • 124
  • Are there some sort of environment variables involved? When I try it on my ubuntu system the output is 'char is ? (3a9)'. It looks like wprintf converted the omega to a question mark because it was unaware that I was on a terminal that could display utf-8. I even set LC_CTYPE to en_US.UTF-8 and it didn't help. – Edward Falk May 25 '12 at 19:28
0

Ahh, I may have found it. You need to execute

setlocale(LC_ALL, "")

first. It looks like the wchar I/O functions are not honoring the LC_ environment variables.

See http://littletux.homelinux.org/knowhow.php?article=charsets/ar01s08 for more background.

Edward Falk
  • 9,991
  • 11
  • 77
  • 112