2

Readed about 30 minutes, and didn't found some specific for this in this site.

Suppose the following, in C#, console application:

ConsoleKeyInfo cki;
cki = Console.ReadKey(true);
Console.WriteLine(cki.KeyChar.ToString()); //Or Console.WriteLine(cki.KeyChar) as well
Console.ReadKey(true);

Now, let's put ¿ in the console entry, and asign it to cki via a Console.ReadKey(true). What will be shown isn't the ¿ symbol, the ¨ symbol is the one that's shown instead. And the same happens with many other characters. Examples: ñ shows ¤, ¡ shows -, ´ shows ï.

Now, let's take the same code snipplet and add some things for a more Console.ReadLine() like behavior:

string data = string.Empty;
ConsoleKeyInfo cki;
for (int i = 0; i < 10; i++)
{
    cki = Console.ReadKey(true);
    data += cki.KeyChar;
}
Console.WriteLine(data);
Console.ReadKey(true);

The question, how to handle this by the right way, end printing the right characters that should be stored on data, not things like ¨, ¤, -, ï, etc?

Please note that I want a solution that works with ConsoleKeyInfo and Console.ReadKey(), not use other variable types, or read methods.

EDIT:

Because ReadKey() method, that comes from Console namespace, depends on Kernel32.dll and it definetively bad handles the extended ASCII and unicode, it's not an option anymore to just find a valid conversion for what it returns.

The only valid way to handle the bad behavior of ReadKey() is to use the cki.Key property that's written in cki = Console.ReadKey(true) execution and apply a switch to it, then, return the right values on dependence of what key was pressed.

For example, to handle the Ñ key pressing:

string data = string.Empty;
ConsoleKeyInfo cki;
cki = Console.ReadKey(true);
switch (cki.Key)
{
    case ConsoleKey.Oem3:
        if (cki.Modifiers.ToString().Contains("Shift")) //Could added handlers for Alt and Control, but not putted in here to keep the code small and simple
            data += "Ñ";
        else
            data += "ñ";
        break;
}
Console.WriteLine(data);
Console.ReadKey(true);

So, now the question has a wider focus... Which others functions completes it's execution with only one key pressed, and returns what's pressed (a substitute of ReadKey())? I think that there's not such substitutes, but a confirmed answer would be usefull.

mishamosher
  • 1,003
  • 1
  • 13
  • 28
  • Mmm... well, from this, one question goes to my mind... why does Console.ReadLine() method differentiate that?, it does get the right characters like ñ, ¿, ´, and it stores it by the right way into strings or chars. Things would be easier if Microsoft just release source code of their methods :D – mishamosher Mar 31 '12 at 08:30
  • 1
    Well, they did. http://referencesource.microsoft.com/netframework.aspx – Hans Passant Mar 31 '12 at 08:35
  • And about console not supporting Unicode, I'm not convinced... two fast results says the opposite: http://stackoverflow.com/questions/1259084/what-encoding-code-page-is-cmd-exe-using and http://www.perlmonks.org/?node_id=329433 – mishamosher Mar 31 '12 at 08:39
  • Well, been reading sources, and found 2 interesting differences. Console.ReadKey() uses a `buffer.keyEvent.uChar` to return the introduced character in console. buffer comes from `Win32Native.InputRecord`, it's a variable type (no sources of Win32Native.InputRecord avaible, totally propietary), and then `keyEvent` and `uChar` will reamin unknown for programmers :S. The case is that buffer doesn't handle extended ASCII/Unicode by the right way. In case of the ReadLine(), it's inherited from System.IO, and uses a StringBuilder and a Read() to it's work, and it does the right job! – mishamosher Mar 31 '12 at 09:08
  • Okay, Console.Read() inherits from Console.In, uses System.Runtime.InteropServices In and Out Attribute, and a char array, also it has three parameters, which are the char[], index and count. It uses a simple char casting of a Int returned value of whatever is in the console, implicitly using the Convert.To metods. In the case of ReadKey(), it needs to handle things like control, alt, shift, funciton keys, and in some part seems like extended ASCII and Unicode support where erased on `Win32Native.InputRecord`. So, that's all, ReadKey doesn't support extended ASCII/unicode! – mishamosher Mar 31 '12 at 09:21
  • Separated by //Other comments, are the methods from source, for proof of all written stuff on here :) http://pastebin.com/c7AvyYLr – mishamosher Mar 31 '12 at 09:24
  • Well, researched, researched, decompiled, done some illegal actions, and nope, ReadKey() doesn't handle by the right way the extended ASCII and Unicode characters, and it won't at less somebuddy rewrittes the Windows Kernel32.dll. So no options to acomplish this and handle by the right way the non expected results of ReadKey(). By this, changed some parts of the question on the EDIT part to be more realistic :) – mishamosher Mar 31 '12 at 10:06

2 Answers2

1

The problem is not that the Console doesn't know how to deal with Unicode (it does, and correctly, check out this thread). The problem lies in your understanding of a keypress on your keyboard, the translation into keycodes, the translation of keycodes into characters and how the ReadKey() method works.

First of all: if you want to read consecutive characters, use Console.ReadLine() instead, it does all the math for you, and better.

Let's take a look at the following program:

Console.WriteLine("Press a key to start (Enter to stop).");

var key = Console.ReadKey();
var allKeys = "";

while(key.Key != ConsoleKey.Enter)
{
    Console.WriteLine(key.KeyChar);
    allKeys += key.KeyChar;
    key = Console.ReadKey();
}

It reads a key from the input, than it appends it to string. Nothing to worry, right? Wrong! On a US International keyboard you can do this:

  • Type ` + a becomes à
  • Type Alt+123 becomes {
  • Type Alt+3355 becomes ←
  • Type ; as if on a Spanish keyboard, becomes ñ

Depending on your keyboard, you will hit a different key for a certain character. Sometimes you will hit a combination of keys. The first combination above is recorded as \0a as a string and keycode 0 (not in the enum) and then ConsoleKey.A. The total resulting string is now "\0á{←ñ".

The Alt+123/3355 is recorded as a keycode 18 (this is the Alt-key). The translation of the numeric keys to a character is done by the OS before it is send to the console.

Typing ; on a US keyboard or ñ on a Spanish keyboard will show you the ConsoleKey.Oem1 (US) and ConsoleKey.Oem3 (Spanish).

While I cannot mimic your behavior, this is probably because I don't have your screen, but it seems very much that the font you have as Console font doesn't support non-Unicode characters. On Windows 7, by default it does, I don't know for other Windows versions. It is also possible that the codepage of your console is set incorrectly.

To summarize
What constitutes a character is dependent on keyboard layout, selected keyboard in international settings, selected language, selected code page in the Console and whether or not combinations of keys are allowed (it gets worse with IME!). To go from KeyChar to normal char is often trivial, but depends on whether your system settings are in sync with each other.

When I run your examples on my system, I do not have the same behavior. But then again, I don't have your system.

Going from a key to a character is tricky business. I suggest you don't rely on your own ability to reinvent what's already in the system. It's good practice to try to see what's going on, but really, move back to ReadLine ;).

EDIT:
I just saw your latest edit. Note that you can have different encodings for input and output (Console.InputEncoding and Console.OutputEncoding). I'd also like to quote the other thread to emphasize that when you switch to Unicode, the codepage doesn't matter anymore. This is the default behavior on recent Windows versions:

If you select a Unicode font, such as Lucida Console or Consolas, then you will be able to see and type Unicode characters on the console, regardless of what chcp says:

Community
  • 1
  • 1
Abel
  • 56,041
  • 24
  • 146
  • 247
  • Just 2 things for more aclaration, and thanking :P. I dealed with source, directly from Microsoft, and applied dissasembling to it, so, I got the .CS files that have the ReadKey (Console.cs) and I know that I did illegal actions on dissasemling, but it's just to know what's not documented. ReadKey stores what it captures on a Win32Native.InputRecord variable, and ReadLine does this on a StringBuilder, and one of them does uses default codepages, and the other uses system ones. By the way, I'm on Windows 7, Spanish LA. – mishamosher Mar 31 '12 at 18:06
  • So, it's just a matter of use the right encoding, based on the keyboard layout that the system has. I'm not going back to ReadLine because I want something that finishes when a key is pressed, with no matter of what's pressed, and the only one that I know that does this is ReadKey. And... thanks for the extra acarations buddy :). – mishamosher Mar 31 '12 at 18:11
  • @mishamosher: you're welcome. You're right about the encoding (see my edit). Still it's important for you to know that you must take care of these combination keys, where `~` + `n` = `ñ`. If you store them directly, regardless of the codepage you have, you will store intermittent `\0` in your string and I'm sure you don't want that. One more thing about encoding: if the font supports Unicode (select Raster Font or Lucida Sans), both input and output support it and you don't have to specify any encoding anymore. Moreover, you're not limited to 255 characters, but the whole Unicode range. – Abel Mar 31 '12 at 21:46
  • Just one thing to be clearer: `Suppose the following, in C#, console application`. Note the Console Application part, the thing that I'm doing is done in console, so, no fonts, and I'm not writting to any file, everything's on RAM. I know that I'm really complicating things in big, a simple ReadLine is easier, but for validations ReadKey does a awesome job. And the part of `~` + `n` didn't give troubles for me, I put it and it works awesome! See my source on http://pastebin.com/F0Xw9GDC - It's just a class and then you do `ValidacionEntradas.LeerConsola(out myVar)` as substitute for ReadLine – mishamosher Mar 31 '12 at 22:07
  • Oh, and comments at pastebin's class are in Spanish, my native language, the one that the teacher demands on the university :P – mishamosher Mar 31 '12 at 22:13
  • 1
    @mishamosher: _"the thing that I'm doing is done in console, so, no fonts,"_ >> of course there's a font. How do you think your text appears on screen? Check my text in the answer (below EDIT) and check the referred to question. In the Console, click menu, click Properties, click Fonts and select a Unicode console font. But glad it works for you. – Abel Apr 01 '12 at 08:37
  • O: ! Cool! Thanks a lot by this! – mishamosher Apr 01 '12 at 17:46
  • Wait, I retire that! That's totally `false`. Already I've tried on more than 50 computers, and with no matter of the font, it doesn't handle by the right way things like `ñ`. The only solution this far that I've found is to change the `Console.[Input/Output]Encoding` properties. – mishamosher Apr 19 '12 at 05:19
  • @mishamosher: well, if that works for you, that's good. Unfortunately, your solution doesn't work for me (and I would never go for anything less than Unicode when I need special characters), but I don't see your whole project, so I'm sure I'm missing something. – Abel Apr 19 '12 at 06:05
  • My project only has one cs file, it's done on Visual Studio 2010 with C# Console App and the target is .NET 4 Client Profile. That's all my project, nothing more, and the cs is on http://pastebin.com/hRC7dPzC for reference. Then, just a `Main` entry and a `ValidacionEntradas.LeerConsolaNl(out TheVarThatsGoingToBeWritten)` for testing – mishamosher Apr 19 '12 at 15:28
0

ReadLine() reconfigures the codepage to use properly the extended ASCII and Unicode characters. ReadKey() leaves it in EN-US default (codepage 850).

Just use a codepage that prints the characters you want, and that's all. Refer to http://en.wikipedia.org/wiki/Code_page for some of them :)

So, for the Ñ key press, the solution is this:

Console.OutputEncoding = Encoding.GetEncoding(1252); //Also 28591 is valid for `Ñ` key, and others too
string data = string.Empty;
ConsoleKeyInfo cki;
cki = Console.ReadKey(true);
data += cki.KeyChar;
Console.WriteLine(data);
Console.ReadKey(true);

Simple :)

And a side note: in some cases it's also necessary to reconfigure the Console.InputEncoding property!

Also, note that if you select another font for the console (Lucida Console/Consolas), this trouble STILLS happen. Lotta thanks to user Abel for this, he appointed to the font changing for solution and made myself discover that this is false.

mishamosher
  • 1,003
  • 1
  • 13
  • 28
  • This probably only works on Spanish keyboards that have the ñ as single keypress. It doesn't work for any other keyboard that requires you to type `~` + `n` or Alt+Num combination. Nor will it work with characters not in Windows-1252, which is the majority of the Unicode range. I don't know _why_ you choose a non-Unicode solution, but maybe this _is_ the only thing that works with Spanish keyboards on consoles. It's unfortunate that it limits you so much and makes it hard to ship your solution to a wide range of customers. – Abel Apr 19 '12 at 06:11
  • Good point, and no, It doesn't handle the Alt+Num entries. And I repeat it, it works on combo key press like `´` and `i`, and many others, actiolly, it works with any combo that the first key that's pressed isn't inmediately written on display (`¨`, ```, `´`, etc, in most cases) – mishamosher Apr 19 '12 at 15:19
  • And why I choose a non unicode solution... It's just for two things: Interoperability with really old Operating Systems, and, I'm not that deep on C# because I don't have the amount of free time. Please remember that this is only for validations, and entries on english or spanish, nothing more, it's just for university purposes, I'm conscient that for a To-Be-Sold product just to choose a WinForms from the begining – mishamosher Apr 19 '12 at 15:22