0

I have a unicode string like this:

0030003100320033

Which should turn into 0123. This is a simple case of 0123 string, but there are some string and unicode chars as well. How can I turn this type of unicode hex string to string in C#?

For normal US charset, first part is always 00, so 0031 is "1" in ASCII, 0032 is "2" and so on.

When its actual unicode char, like Arabic and Chinese, first part is not 00, for instance for Arabic its 06XX, like 0663.

I need to be able to turn this type of Hex string into C# decimal string.

  • Figured it out myself, please delete the question –  Dec 22 '18 at 23:11
  • 2
    There's no reason to delete the question if it can benefit future visitors to the site. SImilarly, if you figured out how to achieve this, you should answer your own question, supplying the solution, so that future visitors will be able to get that answer to their question. Stack Overflow's goal isn't just to answer questions, but to have those answers persist for the common good. – Avner Shahar-Kashtan Dec 23 '18 at 09:02

3 Answers3

3

There are several encodings that can represent Unicode, of which UTF-8 is today's de facto standard. However, your example is actually a string representation of UTF-16 using the big-endian byte order. You can convert your hex string back into bytes, then use Encoding.BigEndianUnicode to decode this:

public static void Main()
{
    var bytes = StringToByteArray("0030003100320033");
    var decoded = System.Text.Encoding.BigEndianUnicode.GetString(bytes);
    Console.WriteLine(decoded);   // gives "0123"
}

// https://stackoverflow.com/a/311179/1149773
public static byte[] StringToByteArray(string hex)
{
    byte[] bytes = new byte[hex.Length / 2];
    for (int i = 0; i < hex.Length; i += 2)
        bytes[i / 2] = Convert.ToByte(hex.Substring(i, 2), 16);
    return bytes;
}

Since Char in .NET represents a UTF-16 code unit, this answer should give identical results to Slai's, including for surrogate pairs.

Douglas
  • 53,759
  • 13
  • 140
  • 188
1

Shorter less efficient alternative:

Regex.Replace("0030003100320033", "....", m => (char)Convert.ToInt32(m + "", 16) + "");
Slai
  • 22,144
  • 5
  • 45
  • 53
-1

You should try this solution

public static void Main()
{
    string hexString = "0030003100320033"; //Hexa pair numeric values
    //string hexStrWithDash = "00-30-00-31-00-32-00-33"; //Hexa pair numeric values separated by dashed. This occurs using BitConverter.ToString()
    byte[] data = ParseHex(hexString);
    string result = System.Text.Encoding.BigEndianUnicode.GetString(data); 
    Console.Write("Data: {0}", result);
}

public static byte[] ParseHex(string hexString)
{
    hexString = hexString.Replace("-", "");
    byte[] output = new byte[hexString.Length / 2];
    for (int i = 0; i < output.Length; i++)
    {
        output[i] = Convert.ToByte(hexString.Substring(i * 2, 2), 16);
    }
    return output;
}
pim3nt3l
  • 96
  • 5
  • Nope, it returns "\00\01\02\03" string, not 0123 –  Dec 22 '18 at 22:45
  • @OracleJava is correct. The bug is not apparent from the console output, since `\0` characters are not displayed. However, `result.Length` gives 8, not 4. – Douglas Dec 23 '18 at 12:51
  • Sorry, my mistake, I introduced ASCII encoding on behalf of BigEndianUnicode – pim3nt3l Jan 11 '19 at 20:29