6

How do I print a Chinese or Japanese character (non-English) in C?

wchar *wc = (wchar *)calloc(50,sizeof(wchar*));
wcscpy(wc ,L"john 麥克風");
printf(" wc : %S \n",wc);

I got john only in my output.

Yu Hao
  • 119,891
  • 44
  • 235
  • 294
user3336737
  • 71
  • 1
  • 5

5 Answers5

4

Software doesn't know about "characters". All it knows are numbers which happen to be interpreted as characters by whatever other software actually displays your output. (For your software, a is merely an alias for 0x61. Turning that 0x61 into the combination of pixels recognized as a is up to your terminal, your GUI or whatever.)

  1. Handling of non-ASCII characters in your source is implementation-defined, i.e. it's up to your compiler what to do with the Chinese characters in your source. If you want to play it safe (and portable), you will have to write non-ASCII characters using their encoding values, e.g. by using \x notation. (Or the newer \u, but that doesn't make the point as neatly as it already refers to Unicode explicitly.)

  2. It should occur to you at this point that you have to agree with your program about what encoding you are using, or your numerical values would mean something entirely different to your program. This is done by setting the locale appropriately. (There are the encodings ISO-8859-1 and ISO-8859-15 and EBCDIC and ... for single-byte, UTF-8 and UTF-16 and EUC and ISO-2022-JP and ... for multi-byte, UCS-2 and UTF-32 and probably others for wide... while everything should be Unicode IMHO, it clearly isn't, and even without the exotics there are three very different Unicode encodings out there.)

  3. As others already pointed out, if you're doing "wide" output, you should use "wide" output functions. (Be aware that "multibyte" is yet another type of fish entirely, and that there are 8-bit multibyte encodings -- UTF-8 -- as well as 16-bit multibyte encodings -- UTF16...)

  4. Whatever you are printing your output to (e.g. the terminal, GUI) needs to support your locale / encoding, and needs to have a font at its disposal that actually has glyphs (i.e. printable characters) loaded for the associated code points. (Otherwise you will get "?", some other placeholder, or some funny looking stuff with tiny hexcodes in it.)

While your actual problem most likely rests with using printf() instead of wprintf(), all the above points must work together in order to correctly print non-ASCII characters. Hence, you might have other problems as well, but it's hard to tell.

DevSolar
  • 67,862
  • 21
  • 134
  • 209
  • 1
    @ajay: I had the opportunity to work with software for the last five years where Unicode and encoding differences were the *main* subject of my everyday work. I did not have clue number one when I started, and boy did I learn in the meantime... I still crack up when I think about the confusion that is Microsoft's take on UCS-2 / UTF-16, and utterly despair when trying to explain the difference to people. ;-) – DevSolar Mar 20 '14 at 09:03
  • @DevSolar: Thanks for your Answer.I understand about ASCII and unicode value.Now I am getting "john ???" in console by using wprintf(). – user3336737 Mar 20 '14 at 10:14
  • @user3336737: Is your terminal capable of displaying Chinese characters in the first place? What happens if you display your source file (the old one, with the characters) in e.g. Vim? – DevSolar Mar 20 '14 at 11:09
  • I replaced %S by %ls in wprintf() , then if first letter is chiense letter of my input then print "?" (Question mark) in my console. If we open the source file in vim then chiense letter "麥克風" displayed as "麥克é¢" – user3336737 Mar 20 '14 at 12:28
  • @user3336737: Looks as if your IDE, your compiler, and your terminal disagree on the encoding being used. This is tricky to debug, and probably better asked in a Unix/Linux oriented forum (assuming you're on Linux -- I'm not sure if there actually *is* a Unicode-capable terminal on Windows.) – DevSolar Mar 20 '14 at 12:41
2

It's a wide string, so you need wprintf:

The Archetypal Paul
  • 41,321
  • 20
  • 104
  • 134
1

Make sure the terminal is using a UTF-8 encoding. C++ itself is unicode agnostic. Also you will need to use std::wcout or wprintf to print those.

WeaselFox
  • 7,220
  • 8
  • 44
  • 75
1

This works for me:

#include <stdio.h>
#include <stdlib.h>
#include <locale.h>
#include <wchar.h>

void main()
    {
    setlocale(LC_ALL, "");

    wprintf(L"Why? 为什么?\n");
    }
Yan King Yin
  • 1,189
  • 1
  • 10
  • 25
1

I solved this problem by setlocale(LC_ALL, "zh_CN.UTF-8") for Chinese and setlocale(LC_ALL, "ja_JP.UTF-8") for Japanese on Ubuntu.

Or you can just modify your system default locale info.

Adrian Mole
  • 49,934
  • 160
  • 51
  • 83
yyutt1
  • 11
  • 1