0

I am experimenting with the Console Output Options and have noticed this:

Console.OutputEncoding = System.Text.Encoding.Unicode;
    Console.WriteLine("\u266F");
    Console.WriteLine("\u6F26");

So I set the Encoding to Unicode(UTF-16LE) and I am trying to output the sharp character "\u266F" This is the sharp in UTF-8 "\u6F26" This is the sharp in UTF-16LE

You can see the output in the screenshotenter image description here

So I have 2 questions

  1. Why does this "\u266F" output a sharp when It is in UTF-8 and I have set the Encoder to Unicode

  2. Why doesn't this output a sharp if it is in UTF-16LE and the Encoder is in UTF-16LE too.

dbc
  • 104,963
  • 20
  • 228
  • 340
Eric Movsessian
  • 488
  • 1
  • 11
  • 1
    A c# `System.String` is always encoded in UTF-16. When you set [`Console.OutputEncoding`](https://learn.microsoft.com/en-us/dotnet/api/system.console.outputencoding?view=net-6.0) you are controlling how the console should **encode** the incoming strings for output. It doesn't change how it interprets the encoding of the strings themselves; that is always UTF-16. See: https://csharpindepth.com/Articles/Strings and [Unicode Support for the Console](https://learn.microsoft.com/en-us/dotnet/api/system.console?view=net-6.0#Unicode). – dbc Mar 20 '22 at 19:56
  • Thank you very much, It is clear. I would like to know why does the console output question marks when I do not write the line Console.OutputEncoding = System.Text.Encoding.Unicode; ? Is there anything wrong with the DEFAULT encoding of the Console? – Eric Movsessian Mar 20 '22 at 20:18
  • Keep in mind that the Windows Console is very very old. Maybe 4 decades old? So there is a lot of legacy stuff going on. Before .NET, my (possibly incorrect) recollection is that the computer itself would have an active encoding, and all c/c++ strings would be interpreted according to that encoding. But things have evolved over the years, and now `Console.OutputEncoding` effectively specifies the subset of Unicode that the Console currently outputs, with anything not supported getting mapped to `?`. – dbc Mar 20 '22 at 20:39
  • From the [docs](https://learn.microsoft.com/en-us/dotnet/api/system.console?view=net-6.0#Unicode): *In general, the console reads input and writes output by using the current console code page, which the system locale defines by default. A code page can handle only a subset of available Unicode characters, so if you try to display characters that are not mapped by a particular code page, the console won't be able to display all characters or represent them accurately.* – dbc Mar 20 '22 at 20:40
  • Here, you should be able to find the answer: https://stackoverflow.com/a/5750227/4795779 – Tom Mar 16 '23 at 20:43

0 Answers0