8

I'm trying to write down a UTF-8 string (Vietnamese) into C# Console but no success. I'm running on Windows 7.

I tried to use the Encoding class that convert string to char[] to byte[] and then to String, but no help, the string is input directly from the database.

Here is some example

Tôi tên là Đức, cuộc sống thật vui vẻ tuyệt vời

It does not show the special character like Đ or ... instead it show up ?, much worse than with the Encoding class.

Does anyone can try this out or know about this problem?


My code

static void Main(string[] args)
{
    XDataContext _new = new XDataContext();
    Console.OutputEncoding = Encoding.GetEncoding("UTF-8");
    string srcString = _new.Posts.First().TITLE;

    Console.WriteLine(srcString);
    // Convert the UTF-16 encoded source string to UTF-8 and ASCII.
    byte[] utf8String = Encoding.UTF8.GetBytes(srcString);
    byte[] asciiString = Encoding.ASCII.GetBytes(srcString);

    // Write the UTF-8 and ASCII encoded byte arrays. 
    Console.WriteLine("UTF-8  Bytes: {0}", BitConverter.ToString(utf8String));
    Console.WriteLine("ASCII  Bytes: {0}", BitConverter.ToString(asciiString));


    // Convert UTF-8 and ASCII encoded bytes back to UTF-16 encoded  
    // string and write.
    Console.WriteLine("UTF-8  Text : {0}", Encoding.UTF8.GetString(utf8String));
    Console.WriteLine("ASCII  Text : {0}", Encoding.ASCII.GetString(asciiString));

    Console.WriteLine(Encoding.UTF8.GetString(utf8String));
    Console.WriteLine(Encoding.ASCII.GetString(asciiString));
}

and here is the outstanding output

Nhà báo đi hội báo Xuân
UTF-8  Bytes: 4E-68-C3-A0-20-62-C3-A1-6F-20-C4-91-69-20-68-E1-BB-99-69-20-62-C3-
A1-6F-20-58-75-C3-A2-6E
ASCII  Bytes: 4E-68-3F-20-62-3F-6F-20-3F-69-20-68-3F-69-20-62-3F-6F-20-58-75-3F-
6E
UTF-8  Text : Nhà báo đi hội báo Xuân
ASCII  Text : Nh? b?o ?i h?i b?o Xu?n
Nhà báo đi hội báo Xuân
Nh? b?o ?i h?i b?o Xu?n


Press any key to continue . . .
phuclv
  • 37,963
  • 15
  • 156
  • 475
DucDigital
  • 4,580
  • 9
  • 49
  • 97
  • 1
    Setting the output encoding to UTF8 should work: `Console.OutputEncoding = Encoding.UTF8`. Are you sure that the problem is not from the way you are reading the text from the database? If you put a breakpoint in your code, is `srcString` encoded correctly? – Darin Dimitrov Feb 06 '10 at 15:25
  • yes, the breakpoint output is 100% fine. I'm considering moving to windows form but i dont need that much fancy feature in this case. :( too bad for winconsole – DucDigital Feb 06 '10 at 16:26

4 Answers4

9
class Program
{
    [DllImport("kernel32.dll")]
    static extern bool SetConsoleOutputCP(uint wCodePageID);

    static void Main(string[] args)
    {
        SetConsoleOutputCP(65001);
        Console.OutputEncoding = Encoding.UTF8;
        Console.WriteLine("tést, тест, τεστ, ←↑→↓∏∑√∞①②③④, Bài viết chọn lọc");
        Console.ReadLine();
    }
}

Screenshot of the output (use Consolas or another font that has all the above characters):

proof

Community
  • 1
  • 1
Roman Starkov
  • 59,298
  • 38
  • 251
  • 324
  • 1
    The font is crucial. I tried the code and I got garbage encoding at first, so I didn't expect a font switch to make a difference, but it did. – Timwi Apr 04 '10 at 17:33
  • It seems that `SetConsoleOutputCP` is no longer necessary to get this to work - perhaps something got fixed in the framework. – Roman Starkov Jun 13 '11 at 15:33
1

You will need to set Console.OutputEncoding to match UTF-8.

Probably something like:

Console.OutputEncoding = System.Text.Encoding.UTF8;
Jan Jongboom
  • 26,598
  • 9
  • 83
  • 120
  • 1
    I've added the example. It's not working at all, my Console.OutputEncoding is a bit different that yours but it work the same way. I tried yours too, still the same – DucDigital Feb 06 '10 at 15:21
  • thanks how to use this into httpclient i'm facing issue :( i tried like this not working for me request.Content.Headers.ContentType.Parameters.Add(new NameValueHeaderValue("charset", "utf-16")); – Neo Jun 22 '20 at 05:54
0

Does the font you use in the Console window support the characters you are trying to display?

Jesper Palm
  • 7,170
  • 31
  • 36
  • 1
    I did not set the font, but it seam luicida can't show UTF8? is there anyway i can change it on-the-fly with C#? – DucDigital Feb 06 '10 at 15:30
-1

it is the problem with cmd.exe console. It doesn't support unicode. [Nothing to do with C#/.NET]

Try changing it to a GUI app if you can or write to a file.

Fakrudeen
  • 5,778
  • 7
  • 44
  • 70
  • But it has only limited fonts support. For example I can't output in Tamil, although I have unicode fonts for that language in the OS. That's what I meant by doesn't support unicode. – Fakrudeen Apr 05 '10 at 11:49
  • I think it only supports monospaced fonts, and probably (not sure!) doesn't do right-to-left properly, but it should be able to do Tamil if you find a monospaced font with Tamil characters. I tried DejaVu and it doesn't seem to have them. – Roman Starkov Apr 26 '10 at 00:26