14

I am experimenting with unicode characters and taking unicode values from Wikipedia page

Ihe problem is my console displays all of C0 Controls and Basic Latin unicode characters ie from U+0000 to U+00FF but for all other categories like Latin Extended -B , Cyrillic , other languges etc , the console prints question mark character (?) .

My C# code is

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;

namespace DataTypes
{
    class Program
    {
        static void Main(string[] args)
        {

            char ch = '\u0181';



            Console.WriteLine("the unicode character is  value" + ch);

        }
    }
}

I am working on windows 7 , Visual studio 2010. What should i do to increase Unicode support.

Mudassir Hasan
  • 28,083
  • 20
  • 99
  • 133
  • 3
    You may need to change your console's codepage or font; see http://stackoverflow.com/questions/1259084/what-encoding-code-page-is-cmd-exe-using – nneonneo Oct 04 '12 at 07:04
  • 3
    This does not sound like a limitation of C#/.NET, but rather: a limitation of the console. For example, my console is in ["ibm850"](http://en.wikipedia.org/wiki/Code_page_850), an 8-bit codepage - there is no chance of writing full unicode on "ibm850". – Marc Gravell Oct 04 '12 at 07:07
  • 1
    Only three fonts are available on my console ... Lucida ,Consolas , Raster ... is this because of this limitation ? – Mudassir Hasan Oct 04 '12 at 07:43
  • 6
    There are far more than 65535 characters in Unicode - are you just interested in the basic multilingual plane? – Damien_The_Unbeliever Oct 04 '12 at 08:01
  • Once I cover these 65535 chars , I will try to proceed further..right now I am able to print only few unicode characters..I have tried changing font available on my console but still not successful. – Mudassir Hasan Oct 04 '12 at 08:04

1 Answers1

21

There's a lot of history behind that question, I'll noodle about it for a while first. Console mode apps can only operate with an 8-bit text encoding. This goes back to a design decision made 42 years ago by Ken Thompson et al when they designed Unix. A core feature of Unix that terminal I/O was done through pipes and you could chain pipes together to feed the output of one program to the input of another. This feature was also implemented in Windows and is supported by .NET as well with the ProcessStartInfo.RedirectStandardXxxx properties.

Nice feature but that became a problem when operating systems started to adopt Unicode. Windows NT was the first one that was fully Unicode at its core. Unicode characters must always be encoded, a common choice back then was UCS, later morphed into utf-16. Now there's a problem with I/O redirection, a program that spits out 16-bit encoded characters is not going to operate well when it is redirected to a program that still uses 8-bit encoded characters.

Credit Ken Thompson as well with finding a solution for this problem, he invented utf-8 encoding.

That works in Windows as well. Easy to do in a console mode app, you have to re-assign the Console.OutputEncoding property:

using System;
using System.Text;

class Program {
    static void Main(string[] args) {
        Console.OutputEncoding = Encoding.UTF8;
        Console.WriteLine("Ĥėļŀō ŵŏŗłđ");
        Console.ReadLine();
    }
}

You'll now however encounter another problem, the font selected for the console window is likely to be unable to render the text. Press Alt+Space to invoke the system menu, Properties, Font tab. You'll need to pick a non-raster font. Pickings are very slim, on Vista and up you can choose Consolas. Re-run your program and the accented characters should render properly. Unfortunately, forcing the console font programmatically is a problem, you'll need to document this configuration step. In addition, a font like Consolas doesn't have the full set of possible Unicode glyphs. You are likely to see rectangles appear for Unicode codepoints for which it has no glyphs. All an unsubtle reminder that creating a GUI program is really your best bet.

Hans Passant
  • 922,412
  • 146
  • 1,693
  • 2,536
  • @Hans Passant This is a great explanation! Thank you for the history lesson :) -- the one single problem that i have with creating a GUI program, is that it doesn't offer all of the possibilities that the Console does as far as working with text... ponder creating the Matrix Rain in a WinForms application... it is nearly impossible because you cannot put a piece of text somewhere without using GDI+, and then you encounter giant hurdles like persistence and flickering... etc... All that to say: yes, GUI programs work... most of the time ;) – MaxOvrdrv Sep 16 '14 at 16:35