8

How can I convert a Unicode value to its equivalent string?

For example, I have "రమెశ్", and I need a function that accepts this Unicode value and returns a string.

I was looking at the System.Text.Encoding.Convert() function, but that does not take in a Unicode value; it takes two encodings and a byte array.

I bascially have a byte array that I need to save in a string field and then come back later and convert the string first back to a byte array.

So I use ByteConverter.GetString(byteArray) to save the byte array to a string, but I can't get it back to a byte array.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Bob
  • 4,236
  • 12
  • 45
  • 65
  • What format do you have the "Unicode" in? – György Andrasek Nov 15 '10 at 12:14
  • Well i'm using the UnicodeEncoding.GetString(byteArray) methos which returns a string from a byteArray. When I inspect the string it has a load of weird looking sybols! – Bob Nov 15 '10 at 12:22
  • "Unicode" is not an encoding. The `UnicodeEncoding` in .NET should have really been called `UTF16Encoding` - shame on Microsoft. :P – Vilx- Nov 15 '10 at 12:37
  • 2
    First of all, stop whatever you're doing and read [Joel's article about Unicode](http://joelonsoftware.com/articles/Unicode.html). Don't even read this answer further. Go there **NOW**! Nop, no peeking, article first! ... OK, done? Then you should be able to spot your mistake and the right answer yourself. If not, then ask yourself - so what encoding is my "string" (byte array) in? – Vilx- Nov 15 '10 at 12:33
  • Ha! thanks :) Doing Encoding.Default worked. I skimmed over the article ... i'll read fully when I have more time though. Cheers – Bob Nov 15 '10 at 16:50
  • @Bob - but really, DO read it. It was an eye opener for me and many others. It may take 20min to read it, but it will be very well spent 20min. After that you will no longer be lost among different character sets, encodings, and mysterious symbols cropping up where they shouldn't. – Vilx- Nov 15 '10 at 21:11

7 Answers7

11

Use .ToString();:

this.Text = ((char)0x00D7).ToString();
shA.t
  • 16,580
  • 5
  • 54
  • 111
Andrew
  • 344
  • 5
  • 7
6

Try the following:

byte[] bytes = ...;

string convertedUtf8 = Encoding.UTF8.GetString(bytes);
string convertedUtf16 = Encoding.Unicode.GetString(bytes); // For UTF-16

The other way around is using `GetBytes():

byte[] bytesUtf8 = Encoding.UTF8.GetBytes(convertedUtf8);
byte[] bytesUtf16 = Encoding.Unicode.GetBytes(convertedUtf16);

In the Encoding class, there are more variants if you need them.

Pieter van Ginkel
  • 29,160
  • 8
  • 71
  • 111
2

To convert a string to a Unicode string, do it like this: very simple... note the BytesToString function which avoids using any inbuilt conversion stuff. Fast, too.

private string BytesToString(byte[] Bytes)
{
  MemoryStream MS = new MemoryStream(Bytes);
  StreamReader SR = new StreamReader(MS);
  string S = SR.ReadToEnd();
  SR.Close();
  return S;
}

private string ToUnicode(string S)
{
  return BytesToString(new UnicodeEncoding().GetBytes(S));
}
Dan Sutton
  • 29
  • 1
1

UTF8Encoding Class

   UTF8Encoding uni = new UTF8Encoding();
   Console.WriteLine( uni.GetString(new byte[] { 1, 2 }));
Ramiz Uddin
  • 4,249
  • 4
  • 40
  • 72
0

There are different types of encoding. You can try some of them to see if your bytestream get converted correctly:

System.Text.ASCIIEncoding encodingASCII = new System.Text.ASCIIEncoding();
System.Text.UTF8Encoding encodingUTF8 = new System.Text.UTF8Encoding();
System.Text.UnicodeEncoding encodingUNICODE = new System.Text.UnicodeEncoding();

var ascii = string.Format("{0}: {1}", encodingASCII.ToString(), encodingASCII.GetString(textBytesASCII));
var utf =   string.Format("{0}: {1}", encodingUTF8.ToString(), encodingUTF8.GetString(textBytesUTF8));
var unicode = string.Format("{0}: {1}", encodingUNICODE.ToString(), encodingUNICODE.GetString(textBytesCyrillic));

Have a look here as well: http://george2giga.com/2010/10/08/c-text-encoding-and-transcoding/.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Giorgio Minardi
  • 2,765
  • 1
  • 15
  • 11
-1
var ascii = $"{new ASCIIEncoding().ToString()}: {((ASCIIEncoding)new ASCIIEncoding()).GetString(textBytesASCII)}";
var utf = $"{new UTF8Encoding().ToString()}: {((UTF8Encoding)new UTF8Encoding()).GetString(textBytesUTF8)}";
var unicode = $"{new UnicodeEncoding().ToString()}: {((UnicodeEncoding)new UnicodeEncoding()).GetString(textBytesCyrillic)}";
GooliveR
  • 244
  • 4
  • 10
-1

Wrote a cycle for converting unicode symbols in string to UTF8 letters:

string stringWithUnicodeSymbols = @"{""id"": 10440119, ""photo"": 10945418, ""first_name"": ""\u0415\u0432\u0433\u0435\u043d\u0438\u0439""}";
var splitted = Regex.Split(stringWithUnicodeSymbols, @"\\u([a-fA-F\d]{4})");
string outString = "";
foreach (var s in splitted)
{
    try
    {
        if (s.Length == 4)
        {
            var decoded = ((char) Convert.ToUInt16(s, 16)).ToString();
            outString += decoded;
        }
        else
        {
            outString += s;
        }
    }
    catch (Exception e)
    {
        outString += s;
    }
}
Dmitrii Matunin
  • 275
  • 4
  • 5