Convert in Another Format With Character Encoding

Question

I am using Oracle 10g database for a C# application. The issue is, in the database, the NVARCHAR column doesn't save other languages except English. As NVARCHAR supports Unicode, this should be working. But instead I've tried a simple method using a tutorial as follows:

Encoding ascii = Encoding.ASCII;
Encoding unicode = Encoding.Unicode;

//Convert the string into a byte[].
byte[] unicodeBytes = ascii.GetBytes("আমার সোনার বাংলা!"); //Text to show

//Perform the conversion from one encoding to the other.
byte[] asciiBytes = Encoding.Convert(ascii, unicode, unicodeBytes);
char[] asciiChars = new char[ascii.GetCharCount(asciiBytes, 0, asciiBytes.Length)];
ascii.GetChars(asciiBytes, 0, asciiBytes.Length, asciiChars, 0);
string asciiString = new string(asciiChars);
Console.WriteLine(asciiString);

Console.ReadKey();

May seems silly but was expecting if I can show the text in the console app with the above format. Now it shows question (??????) marks. Any way I can show the text and save it at least in any other format, so I can retrieve and show it appropriately in the front-end.

you are converting it back to ASCII. remove that part. if the ODP is the problem, please specify driver and version, and include the database related part. currently, the oracle10g tag would not apply. — Cee McSharpface, Apr 16 '18 at 07:29
`byte[] unicodeBytes = ascii.GetBytes(...)` < doesn't that look suspicious for you? — Evk, Apr 16 '18 at 07:30
ASCII can't convert that text to bytes. ASCII only contains the Western-European Latin alphabet, the plain numbers and some punctuations and brackets. Why are you even attempting to use it? — Nyerguds, Apr 16 '18 at 07:31
Also note, there is no such encoding as "unicode". `Encoding.Unicode` is actually UTF16-LE. — Nyerguds, Apr 16 '18 at 07:35
Possibly useful: [How to write Unicode characters to the console?](https://stackoverflow.com/q/5750203/3744182). — dbc, Apr 16 '18 at 07:36
Thanks for the clarification @Nyerguds. I am new to it and apologies. Any way that I can make it work? — user8512043, Apr 16 '18 at 07:36
Yes. Don't convert it at all. NVARCHAR is text, not bytes. You say it supports unicode, so it should support any language. Which characters are supported on Oracle depends entirely on the character set configured in the database. — Nyerguds, Apr 16 '18 at 07:37
If it indeed is a problem, and you need to use pure ASCII, consider using Base64 encoding. — Nyerguds, Apr 16 '18 at 07:41
Let me show an example. Right now, I am converting the format with this link - http://www.banglaconverter.net/tools.php?f=Unicode-To-Bijoy and it shows perfectly but from application-level, I am unable to do it directly. — user8512043, Apr 16 '18 at 07:44
[bijoy is not ASCII](https://en.wikipedia.org/wiki/Bengali_input_methods). what I gleaned from this wikipedia page, it seems to be an encoding that uses the codepoints of the ASCII range, but depends on a **certain font** to show the Bengal. We probably need somebody with good domain knowledge in both Bengal and character encoding to properly answer this, but I recomment to add your oracle data provider specific code anyway, and stay in unicode entirely if you can! the question is not silly at all, character sets are in a league with html parsing and time zones. — Cee McSharpface, Apr 16 '18 at 13:00
Thanks very much @dlatikay for your understanding. Would love to follow your instructions. — user8512043, Apr 16 '18 at 16:10

Cee McSharpface · Accepted Answer · 2018-04-16T18:01:28.370

If you can use unicode (and you should, hey its 2018), then it would be best to avoid Bijoy altogether. Process and store everything that is a string, as System.String in .NET and as NVARCHAR in Oracle.

The Windows console can handle unicode without any problems, if we observe two important prerequisites that the documentation clearly states:

Support for Unicode [...] requires a font that has the glyphs needed to render that character. To successfully display Unicode characters to the console, the console font must be set to a [...] font such as Consolas or Lucida Console

This is something you must ensure in Windows setting, independently from your .NET application.

The second prerequisite, emphasis mine:

[...] Console class supports UTF-8 encoding [...] Beginning with the .NET Framework 4.5, the Console class also supports UTF-16 encoding [...] To display Unicode characters to the console. you set the OutputEncoding property to either UTF8Encoding or UnicodeEncoding.

What the documentation does not say, is that none of the fonts that can be selected from the properties menu of the console window will normally contain glyphs of all alphabets in the world. If you needed right-to-left capability as for example with Hebrew or Arabic, you're out of luck.

If the program is running a Windows version without the east asian fonts preinstalled, follow this tutorial to install the Bangla LanguageInterfacePack (KB3180030).

Then apply this answer to our problem as follows:

open the windows registry editor
navigate to HKLM\Software\Microsoft\WindowsNT\CurrentVersion\Console\TrueTypeFont
create a new string value, assign an available key like "000", and the value "Bangla Medium"
reboot the PC

Now set the console font to "Bangla", using the window menu of the console, last menu item "Properties", second tab "Font".

Finally get rid of all that encoding back and forth, and simply write:

using System;
using System.Text;

namespace so49851713
{
    class Program
    {
        public static void Main()
        {
            var mbb = "\u263Aআমার সোনার বাংলা!";
            /* prepare console (once per process) */
            Console.OutputEncoding = UTF8Encoding.UTF8;
            Console.WriteLine(mbb);
            Console.ReadLine();
        }
    }
}

Convert in Another Format With Character Encoding

1 Answers1