1

I have a couple of cases where I want to print Unicode characters in a Windows console, programming in both C# and Visual Basic. Chinese is turning out to be a problem.

I've tried finding answers on MSDN and on StackOverflow here and here. And I've read Joel on Software, too. But I think I read that article back in 2003, anyway, and it's not all that relevant to my problem today.

Here's some example code; it's practically the same as what is proposed by Microsoft but with a slight modification to display Chinese and Greek.

using System;
using System.Text;

class Example
{
    static void Main()
    {
        string unicodeString = "This string contains some Chinese (福更斯)";
        string unicodeString2 = "This string contains some Greek (Αλφάβητο)";

        // Create two different encodings.
        Encoding ascii = Encoding.ASCII;
        Encoding unicode = Encoding.Unicode;

        // Convert the strings into  byte arrays.
        byte[] unicodeBytes = unicode.GetBytes(unicodeString);
        byte[] unicodeBytes2 = unicode.GetBytes(unicodeString2);

        // Perform the conversion from one encoding to the other.
        byte[] asciiBytes = Encoding.Convert(unicode, ascii, unicodeBytes);
        byte[] asciiBytes2 = Encoding.Convert(unicode, ascii, unicodeBytes2);

        // Convert the new byte[] into a char[] and then into a string.
        char[] asciiChars = new char[ascii.GetCharCount(asciiBytes, 0, asciiBytes.Length)];
        ascii.GetChars(asciiBytes, 0, asciiBytes.Length, asciiChars, 0);
        string asciiString = new string(asciiChars);

        char[] asciiChars2 = new char[ascii.GetCharCount(asciiBytes2, 0, asciiBytes2.Length)];
        ascii.GetChars(asciiBytes2, 0, asciiBytes2.Length, asciiChars2, 0);
        string asciiString2 = new string(asciiChars2);



        // Display the strings created before and after the conversion.
        Console.WriteLine("Original string: {0}", unicodeString);
        Console.WriteLine("Ascii converted string: {0}", asciiString);

        Console.WriteLine("Original string: {0}", unicodeString2);
        Console.WriteLine("Ascii converted string: {0}", asciiString2);
    }
}
// The example displays the following output:
//    Original string: This string contains some Chinese (福更斯)
//    Ascii converted string: This string contains some Chinese (???)
//    Original string: This string contains some Greek(Αλφάβητο)
//    Ascii converted string: This string contains some Greek(????????)

(adapted from https://learn.microsoft.com/en-us/dotnet/api/system.text.encoding?view=netframework-4.7)

I compile this in Visual Studio 2017, targetting .NET 4.5.2. I start a console with cmd /K chcp 65001. When I run the .exe, this is what I see:

Original string: This string contains some Chinese (☒☒☒)
Ascii converted string: This string contains some Chinese (???)
Original string: This string contains some Greek (Αλφάβητο)
Ascii converted string: This string contains some Greek (????????)

Here, I'm using ☒ to represent what I see in the console, which is a question mark in a box. If I copy what I see in the console, I get Chinese...

The console uses Consolas, and in Notepad this font displays the Chinese correctly. But in the console, to see the Chinese correctly I need to switch the font to MS Gothic or NSimSum, but in that case, the Greek is horrible (characters are spaced further apart than they should be).

Is there a simple and reliable way of getting the console to display Chinese?

When the code is finished, it will be distributed as an example to third parties, so I'll also need to be able to explain a simple solution for correctly displaying the text, that should work in almost 100% of cases (Windows 10, Visual Studio 2017, .NET 4.5 and later) without needing to download and install extra components (extra fonts, for instance).

Keith
  • 11
  • 2
  • Converting from Unicode (136755 characters as of June 2017) to ASCII (128 characters since June 1963) is lossy so I can't see what's to be gained by the code. – Tom Blodget Jun 26 '17 at 11:47
  • It's normal that the Unicode to ASCII loses information, and here is not very instructive. But on the other hand, you see in the ASCII converted text a full size question mark whereas in the Unicode line of text you see a small question mark in a box (I had to use a ballot box ☒ to represent that, because I couldn't find the question mark in a box). – Keith Jun 26 '17 at 12:35
  • The small question mark in a box is the rendering of your actual text. The problem is entirely with fonts. Perhaps your users have fonts already installed and selected that work in their consoles. – Tom Blodget Jun 26 '17 at 12:39
  • Possible duplicate of [Set C# console application to Unicode output](https://stackoverflow.com/questions/38533903/set-c-sharp-console-application-to-unicode-output) – Clint Jun 26 '17 at 13:11
  • You need a fixed-pitch font that has Chinese glyphs. Not that hard to find, on a machine with the Chinese language version of Windows. But sure, pretty hard to find on a machine owned by somebody named "Keith". – Hans Passant Jun 26 '17 at 13:30
  • @ Tom Blodget: Yes, I get that the problem is with the font. If I copy the text rendered in the console as questions mark in a boxes and then paste it into Notepad, I see Chinese, and this happens even using the same font in both the console and Notepad. Trying this with both Consolas and Lucida Console I get the same result. @Clint: that link might have some interesting info, I'll see if I can make use of it. – Keith Jun 26 '17 at 13:45
  • Well, after trying a number of different fonts, I decided that the simplest solution would be to instruct users to run the executable with a redirection of stdout to a text file, then to open that text file in an application that is capable of correctly displaying Unicode characters. My build and test environment doesn't have much extra software (no MS Office, for example), so I leave it to the users to choose an application to display the text. – Keith Jul 03 '17 at 07:58

0 Answers0