1

So, I have a string that is actually UTF encoded characters with the ASCII representation codes stripped out: "537465616d6c696e6564" This would be represented in ASCII encoded UTF as \x53\x74\x65 [...]

I've tried to Regexp replace in \x at the correct positions, byte encoding it and reading it back as UTF, but to no avail.

What's the most effective way of turning the ASCII string into readable UTF in C#?

2 Answers2

1

So what I understand you have a string "537465616d6c696e6564" that actually means char[] chars = { '\x53', '\x74', ... }.

First convert this string to array of bytes How can I convert a hex string to a byte array?

For your convenience:

public static byte[] StringToByteArray(string hex) {
    return Enumerable.Range(0, hex.Length)
                     .Where(x => x % 2 == 0)
                     .Select(x => Convert.ToByte(hex.Substring(x, 2), 16))
                     .ToArray();
}

Then there are many UTF encodings (UTF-8, UTF-16), C# internally uses UTF-16 (actually subset of it), so I assume that you want UTF-16 string:

string str = System.Text.Encoding.Unicode.GetString(array);

If you get incorrect characters after decoding you may also try UTF-8 encoding (just in case if you don't know exact encoding, Encoding.UTF8).

Community
  • 1
  • 1
csharpfolk
  • 4,124
  • 25
  • 31
0

I don't know much about string encodings, but assuming that your original string is the hex representation of a series of bytes, you could do something like this:

class Program
{
    private const string encoded = "537465616d6c696e6564";

    static void Main(string[] args)
    {
        byte[] bytes = StringToByteArray(encoded);

        string text = Encoding.ASCII.GetString(bytes);

        Console.WriteLine(text);
        Console.ReadKey();
    }

    // From https://stackoverflow.com/questions/311165/how-do-you-convert-byte-array-to-hexadecimal-string-and-vice-versa
    public static byte[] StringToByteArray(String hex)
    {
        int NumberChars = hex.Length;
        byte[] bytes = new byte[NumberChars / 2];
        for (int i = 0; i < NumberChars; i += 2)
            bytes[i / 2] = Convert.ToByte(hex.Substring(i, 2), 16);
        return bytes;
    }
}

If you later wanted to encode the result as UTF8, you could then use:

Encoding.UTF8.GetBytes(text);

I've taken one implementation of the StringToByteArray conversion but there are many. If performance is important, you may want to choose a more efficient one. See the links below for more info.

On byte to string conversion (some interesting discussions on performance):

On strings in .NET

Community
  • 1
  • 1
barjac
  • 1
  • 1