1

Hi I am trying to print a string in c++, which is not in English, and the output is always ????, for example, I want to print a korean world '선배' or Thai word 'ยิ่ง', the simple code snippet is as follows-

main(){    
string name("선배");// string name("ยิ่ง");
int len=name.size();
cout<<"\n name:  "<<name;
cout<<"\n length "<<len;
}

OUTPUT:

 name:  ??
 length 2

Where as if I change the string line by English character as-

 string name("ab");

OUTPUT:

name:  ab
length 2

Update: I also tried wchar_t, which is also printing question marks.

code-

wchar_t *a=L"อดีตรักงานไหม";
wprintf(L"sss : %s \n" , a);

I checked the property of the project, project properties->configuration properties->general and the Character set is set as ' Use Unicode Charecter Set'

Anybody can please tell me what is going wrong? How can I get it printing different languages?

regards

MMH
  • 1,676
  • 5
  • 26
  • 43
  • Do you have an charset for korean includet, like ISO-2022-KR? – jawo Oct 21 '14 at 06:13
  • Hi Sempie, sorry but I am not aware of charset, how can I check that? – MMH Oct 21 '14 at 06:22
  • @MMH If, as seems to be the case, this is a Windows console (command line) application, then the problem is probably the font for the Command Prompt window. What happens if you open a command prompt and try to display your source code (just using `more main.cpp` or whatever you called it)? Can you read the embedded strings correctly then? – AAT Oct 21 '14 at 10:07
  • @AAT you are right when I try to read the main.cpp, I cannot read the string. it shows '???' , how do I change the font of command prompt? and what font should I change? – MMH Oct 22 '14 at 00:32

3 Answers3

1

I'm not familiar with korean, but in general you need to do two things:

  • Set the correct code page using std::locale OR use unicode (for example std::wstring and std::wcout).

  • Set your console to a font that can display those characters. The default font in Windows can not do this.

If you are using Windows, you can set the console font by using SetCurrentConsoleFontEx

CONSOLE_FONT_INFOEX cfi;
cfi.cbSize = sizeof cfi;
cfi.nFont = 0;
cfi.dwFontSize.X = 0;
cfi.dwFontSize.Y = 16;
cfi.FontFamily = FF_DONTCARE;
cfi.FontWeight = FW_NORMAL;
wcscpy_s(cfi.FaceName, L"Consolas");
SetCurrentConsoleFontEx(GetStdHandle(STD_OUTPUT_HANDLE), FALSE, &cfi);

IF you want to set it independent of your actual application or you do not have the prerequisites for the function above, you can have a look at the different guides on the internet, for example this one.

I have no clue what font may support asian characters, you will need to check this yourself. Any unicode font should do.

nvoigt
  • 75,013
  • 26
  • 93
  • 142
  • Korean was just for an example, I tried with Thai alwell having same output – MMH Oct 21 '14 at 06:11
  • Due non-romanian charsets generally use double byte lenght, this still could cause issues. – jawo Oct 21 '14 at 06:20
  • 1
    More info about `string` and `wstring` [here](http://stackoverflow.com/a/402918/2710064). Also, don't forget to save the code file to UTF8 (or in the correct locale) otherwise the characters will simply 'disappear' from the code. – DoubleYou Oct 21 '14 at 06:21
  • @MHH so did you try what I wrote? – nvoigt Oct 21 '14 at 06:21
  • @nvoigt, I used wstring, but having same output. I didn't set the console font though. – MMH Oct 21 '14 at 07:03
  • @MHH Then *do* set the console font? – nvoigt Oct 21 '14 at 07:29
  • @nvoigt Can you please update your answer with how do I set the console font and what font should I set? – MMH Oct 21 '14 at 08:45
  • [Check my answer here](http://stackoverflow.com/a/40337240/3258851) for the first part (using Unicode properly). – Marc.2377 Oct 31 '16 at 07:13
0

You need to write byte order mark (BOM) first then you can print this.

user966379
  • 2,823
  • 3
  • 24
  • 30
0

I am working on a project in Hebrew using Microsoft Visual Studio Community 2019.

When trying to output string literals of non-English characters in any way, I get either question marks in boxes, or just question marks. I checked to see how the command line handles the situation by saving a file with a Hebrew name in Explorer and then accessing it through CMD. Again, question marks.

I am guessing there is a way to include the language packs in the c++ script (that's what I am looking up now), but saw this and wanted to share what else I found out. By looking at the Disassembly of my code I noticed that the Assembler is mishandling the assignment to the register. When the value (characters) are loaded, the Unicode formatting (right to left) causes the Assembler to flip the parenthesis and shift the first two (last two) values to the opposite side resulting in an unusable value in the register:

 eax,dword ptr [ב (0BDF3A0h)] 

Eeven as I try to save this it comes out wrong: what it should be is a zero, a right-parenthesis, a space, and then the Hebrewcharacter enter image description here

which should be [(0בBDF3A0h)].

(Somehow in my code, I now have the Unicode for א outputting the value assigned to it...)

I'm looking at how to handle the issue. Hopefully, you know more than I do :-) Good luck!

More: As variable as string literal

StarShine
  • 1,940
  • 1
  • 27
  • 45
  • Looks promising: https://learn.microsoft.com/en-us/windows/win32/intl/creating-a-multilingual-user-interface-application – Brian Grasso Feb 05 '21 at 02:47
  • Perhaps it makes sense to check if this is true UTF8, UTF-16 or Microsoft's UCS-2 16-bit multi-byte code rendition that is not actually true UTF-8. Hebrew is not my strong point but you can verify character sets online. Also these pointers may help: https://lucumr.pocoo.org/2014/1/9/ucs-vs-utf8/ and https://stackoverflow.com/questions/3473295/utf-8-or-utf-16-or-utf-32-or-ucs-2 – StarShine Feb 05 '21 at 08:55
  • But as a general rule on SO, if you have a question, please post is as a propper question and not an answer in someone else's Q/Q thread.Thanks! – StarShine Feb 05 '21 at 08:59