10

I want to display an Arabic message mixed with Chinese using wcout.

The following code is OK:

#include <iostream>

using namespace std;

int main()
{
    wcout.imbue(locale("chs"));
    wcout << L"中文"; // OK
}

However, the following code doesn't work:

#include <iostream>

using namespace std;

int main()
{
    wcout.imbue(locale(/* What to place here ??? */));
    wcout << L"أَبْجَدِيَّة عَرَبِيَّة‎中文"; // Output nothing. VC++ 2012 on Win7 x64
    // Why does the main advantage of unicode not apply here?
}

I think the concept of code pages should be deprecated after the adoption of unicode.

Q1. What's the mechanism of wout's displaying such a text?

Q2. Why does Windows, as a unicode-based OS, not support outputting unicode characters in its console window?

ruakh
  • 175,680
  • 26
  • 273
  • 307
xmllmx
  • 39,765
  • 26
  • 162
  • 323
  • 3
    What problems do you have with the code above? – Ivaylo Strandjev Feb 04 '13 at 16:35
  • wcout << L"أَبْجَدِيَّة عَرَبِيَّة‎中文"; // The output is not as expected. VC++ 2012 – xmllmx Feb 04 '13 at 16:37
  • 1
    Maybe take a look [here](http://stackoverflow.com/questions/2492077/output-unicode-strings-in-windows-console-app) – Ivaylo Strandjev Feb 04 '13 at 16:40
  • It is implementation-defined (as everything that has to do with string literals that go beyond the basic execution character set). – n. m. could be an AI Feb 04 '13 at 16:40
  • @Andy, Windows 7 64-bit – xmllmx Feb 04 '13 at 16:43
  • @xmllmx: maybe [this](http://stackoverflow.com/questions/4406895/what-stdlocale-names-are-available-on-common-windows-compilers) might help you then? – Andy Prowl Feb 04 '13 at 16:44
  • Please, could you explain more your goals? Do you want create console application which **produce UNICODE output** in the mix of languages or you want **to display** the output of the application in the Windows Console? Is C++ functions are important for you or the application can use Windows API, no C++ classes and be written in pure C? Is redirection of the output of the application in the file also important for you? – Oleg Feb 08 '13 at 17:12
  • @Oleg, I just wonder how to produce UNICODE output in the mix of languages in pure C++. The Windows API based solutions are not what I want. I want to know the mechanism of how such a unicode string is outputted to console. – xmllmx Feb 08 '13 at 17:23
  • Probably you don't clear understand the steps of solution. First you need to create console application. The implementation of console application is different on different target OS. If you want create console application running under Windows you should first understand how one can implement it without the problems of configuration of Windows, configuration of console (like `chcp 65001`), usage of correct fonts and without the problems which is specific with the usage some C++ libraries. – Oleg Feb 08 '13 at 17:32
  • Just try to execute the code `_setmode(_fileno(stdout), _O_WTEXT); std::wcout << L"أَبْجَدِيَّة عَرَبِيَّة‎中文Русский" << std::endl;` for example, but call `chcp 65001` in the console **before** starting of your application. Then start it with option `>%temp%\t.txt` to redirect the results in file and open `>%temp%\t.txt` file in Notepad. You will see the text "أَبْجَدِيَّة عَرَبِيَّة‎中文Русский" correctly. – Oleg Feb 08 '13 at 17:58
  • @Oleg, notepad.exe is no problem. The problem remains in console. – xmllmx Feb 08 '13 at 18:20
  • 1
    What is your real problem? Do you want to display the results on *your* computer or you want create console application which should display the same information on *every Windows computer*? The last one is not possible. If you create application which use other people you should consider to you not console applications. Console application were interpreted as *legacy application* even in the time on Windows NT 3.1 (for more as 20 years). In the main design goal was the compatibility with old application. It's the reason of usage code pages existing in more early world. – Oleg Feb 08 '13 at 19:13
  • This is likely a mission impossible. Thanks, Oleg. – xmllmx Feb 08 '13 at 20:21
  • @xmllmx: You are welcome! Sorry, for bad news, but usage of [_setmode](http://msdn.microsoft.com/en-us/library/tw4k6df8.aspx) with `_O_U16TEXT`, `_O_U8TEXT` or `_O_WTEXT` is really enough to enable Unicode mode *in the console application*. To be able to see the results one have to use UNICODE code page (execute `chcp 65001` in the cmd). The last requirement is to use the Font in the console which can display the results. The last requirement is the most complex for common computer, so the only safe way will be piping the results to the file which really helpful only in seldom scenarios. – Oleg Feb 09 '13 at 20:04
  • Maybe a console font does not supported your characters. – Kastaneda Oct 08 '14 at 17:26
  • system("chcp 65001"); system("chcp 936"); – Zhang May 13 '20 at 03:46
  • ** C++, 2023 still has no a easy way to use std::wcout output the ** Chinese. – huang Jan 31 '23 at 09:56

6 Answers6

5

CRT would treat all output to files as ANSI by default. You can change that with this line at the start of your program

_setmode(_fileno(stdout), _O_WTEXT);

A good reference @ http://www.siao2.com/2008/03/18/8306597.aspx

Just for reference bidirectional language support is limited in most command prompts and from what I understand that is the limitation causing this issue here. The why it is not/supported is something that I cannot answer.

Shog9
  • 156,901
  • 35
  • 231
  • 235
allen
  • 4,627
  • 1
  • 22
  • 33
  • @xmllmx Might be your font. Tried with Courier New and I see the Arabic but not the Chinese. – Joel Rondeau Feb 04 '13 at 17:04
  • @xmllmx: can you *type* Arabic characters in the console? – n. m. could be an AI Feb 04 '13 at 17:04
  • Try just copying and pasting the Arabic text into the shell. That'll tell you if the font supports it. – Gort the Robot Feb 04 '13 at 17:04
  • 1
    first thing to check is to see if you redirect it to a file and get the desired output. if you do then it is most probably a character encoding limitation on the cmd prompt itself. There are many references on stackoverflow on the same. – allen Feb 04 '13 at 17:12
  • 2
    That must be the reason. Your console font probably does not support them. – n. m. could be an AI Feb 04 '13 at 17:13
  • @allen, Why does Windows, as a unicode-based OS, not support outputting unicode characters in its console window? – xmllmx Feb 04 '13 at 17:27
  • @xmllmx: That is not a constructive question. – ruakh Feb 04 '13 at 17:34
  • @xmllmx Am sure you can easily find blogs about this but honestly I dont know the exact reasoning to answer it anyways (above my pay grade too :) ) – allen Feb 04 '13 at 17:38
  • 2
    This is not so much about Unicode, but about [Bidi](http://blogs.msdn.com/b/oldnewthing/archive/2012/10/26/10362864.aspx). – Remus Rusanu Feb 04 '13 at 19:05
  • As far as I know, the console don't use Uniscribe to output text. Arabic text will not show up correctly without proper contextual shaping. It's sort of connect to bidi, but not quite. Other complex scripts like Hindi (which is left-to-right) won't work either. – cleong Feb 15 '13 at 06:56
4

You cannot portably print wide strings using standard C++ facilities.

Instead you can use the open-source {fmt} library to portably print Unicode text. For example (https://godbolt.org/z/nccb6j):

#include <fmt/core.h>

int main() {
  fmt::print("أَبْجَدِيَّة عَرَبِيَّة‎中文");
}

prints

أَبْجَدِيَّة عَرَبِيَّة‎中文

This requires compiling with the /utf-8 compiler option in MSVC.

For comparison, writing to wcout on Linux (https://godbolt.org/z/h9WKsY):

std::wcout << L"أَبْجَدِيَّة عَرَبِيَّة‎中文";

prints

???????????? ?????????????

unless you switch the global locale to e.g. en_US.utf8. Similar issue exists on Windows with no standard way to fix it (you have to use non-standard CRT functions or Windows API).

Disclaimer: I'm the author of {fmt}.

vitaut
  • 49,672
  • 25
  • 199
  • 336
  • C++ 20, std::wprint(L"中文") still output an empty string, I'm not sure does equals to . – huang Jan 31 '23 at 09:34
2

I just read this article

"To the summary...

If you use Visual C++ you can't use UTF-8 to print text to std::cout.

If you still want to, please read this amazingly long article about how to make wcout and cout working, but it does not really give a simple solution - finally falling to redefinition of the stream buffers..." http://alfps.wordpress.com/2011/12/08/unicode-part-2-utf-8-stream-mode/

(from this blog http://blog.cppcms.com/post/105)

1

You can try this:

I assume that you were able to render Chinese only text. That signifies that you have chinese font files.

You please try with arabic only text. If you are able to render, that signifies that you have arabic font in your system.

But when you mix this, arabic + chinese, then you need to force to pick a font file which has both glyph sets. I think the default font file picked up by wcout doesnt have the arabic glyphs.

I assume that you may be getting boxes for arabic unicodes.

Ritesh
  • 1,809
  • 1
  • 14
  • 16
1
#include <iostream>
#include <io.h>
#include <fcntl.h>

int main() {
    _setmode(_fileno(stdout), _O_U16TEXT); // or _O_WTEXT
    std::wcout << L"أَبْجَدِيَّة عَرَبِيَّة‎中文" << std::endl;
}

http://www.cplusplus.com/forum/beginner/126557/

Shen Yu
  • 147
  • 1
  • 4
0

On Windows

I recommend redirecting the wcout buffer to a file to facilitate viewing the results, because the Windows command prompt unable to display some unicode fonts.
#include <iostream>
#include <fstream>                                                                                                                     

int main()
{
    std::locale myloc("en_US.UTF-8");
    std::locale::global(myloc);                                              

    std::wfilebuf wfbuf;
    wfbuf.open("result.txt", std::ios::out);
    std::wcout.rdbuf(std::addressof(wfbuf));   
                                                
    std::wcout << L"أَبْجَدِيَّة عَرَبِيَّة‎中文";   
 
    return 0;
}

On Linux

Method1

#include <iostream>                                                                                                                        
                       
int main()
{
    std::ios::sync_with_stdio(false);// make wcout no longer depend on stdio
    std::locale myloc("en_US.UTF-8");                                           
    std::wcout.imbue(myloc);                                                    
    std::wcout << L"أَبْجَدِيَّة عَرَبِيَّة‎中文";   
 
    return 0;
}

Method2

#include <cstdio>                                                                                                                        
                       
int main()
{
    std::locale myloc("en_US.UTF-8");                                           
    std::locale::global(myloc);// can affect stdio's locale                                               
    wprintf(L"أَبْجَدِيَّة عَرَبِيَّة‎中文");  
 
    return 0;
}

Method3

#include <cstdio>                                                                                                                        
                       
int main()
{
    std::locale myloc("en_US.UTF-8");                                           
    std::locale::global(myloc);// can affect stdio's locale                                                                 
    std::wcout << L"أَبْجَدِيَّة عَرَبِيَّة‎中文";// wcout depend on stdio, it doesn't matter if wcout's locale still C locale.

    return 0;
}
name-1001
  • 120
  • 2
  • 4