14

Possible Duplicate:
How do you convert Byte Array to Hexadecimal String, and vice versa?

I need an efficient and fast way to do this conversion. I have tried two different ways, but they are not efficient enough for me. Is there any other quick method to accomplish this in a real-time fashion for an application with huge data?

  public byte[] StringToByteArray(string hex)
    {
        return Enumerable.Range(0, hex.Length / 2).Select(x => Byte.Parse(hex.Substring(2 * x, 2), NumberStyles.HexNumber)).ToArray(); 
    }

The above one felt more efficient to me.

 public static byte[] stringTobyte(string hexString)
    {
        try
        {
            int bytesCount = (hexString.Length) / 2;
            byte[] bytes = new byte[bytesCount];
            for (int x = 0; x < bytesCount; ++x)
            {
                bytes[x] = Convert.ToByte(hexString.Substring(x * 2, 2), 16);
            }
            return bytes;
        }
        catch
        {
            throw;
        }
Community
  • 1
  • 1
Mask
  • 647
  • 2
  • 9
  • 21
  • 5
    The other question, although ostensibly about conversions in both directions, ends up focussing on the conversion from bytes to hex. This question is about the best conversion in the other direction, so FWIW, I think it adds something. – JasonD Jan 15 '13 at 10:44
  • 1
    This is definitely not a duplicate, as it give a very focused set of answers to the particular question provided – Charles Okwuagwu Aug 21 '15 at 13:56

3 Answers3

28

If you really need efficiency then:

  • Don't create substrings
  • Don't create an iterator

Or, and get rid of try blocks which only have a catch block which rethrows... for simplicity rather than efficiency though.

This would be a pretty efficient version:

public static byte[] ParseHex(string hexString)
{
    if ((hexString.Length & 1) != 0)
    {
        throw new ArgumentException("Input must have even number of characters");
    }
    int length = hexString.Length / 2;
    byte[] ret = new byte[length];
    for (int i = 0, j = 0; i < length; i++)
    {
        int high = ParseNybble(hexString[j++]);
        int low = ParseNybble(hexString[j++]);
        ret[i] = (byte) ((high << 4) | low);
    }

    return ret;
}

private static int ParseNybble(char c)
{
    // TODO: Benchmark using if statements instead
    switch (c)
    {
        case '0': case '1': case '2': case '3': case '4':
        case '5': case '6': case '7': case '8': case '9':
            return c - '0';
        case 'a': case 'b': case 'c': case 'd': case 'e': case 'f':
            return c - ('a' - 10);
        case 'A': case 'B': case 'C': case 'D': case 'E': case 'F':
            return c - ('A' - 10);
        default:
            throw new ArgumentException("Invalid nybble: " + c);
    }
    return c;
}

The TODO refers to an alternative like this. I haven't measured which is faster.

private static int ParseNybble(char c)
{
    if (c >= '0' && c <= '9')
    {
        return c - '0';
    }
    c = (char) (c & ~0x20);
    if (c >= 'A' && c <= 'F')
    {
        return c - ('A' - 10);
    }
    throw new ArgumentException("Invalid nybble: " + c);
}
Jon Skeet
  • 1,421,763
  • 867
  • 9,128
  • 9,194
  • @MitchWheat: I suspect so - it creates a new string object for each byte. Even though the GC is pretty efficient, that's still extra work which really isn't required. – Jon Skeet Jan 15 '13 at 07:05
  • @MitchWheat: I'll add a link to here... – Jon Skeet Jan 15 '13 at 07:08
  • @MitchWheat: Thanks - return statement added. – Jon Skeet Jan 15 '13 at 07:13
  • Yes it is faster, i have checked. – Mask Jan 15 '13 at 08:01
  • 1
    In the `if` variant you can fold the tests for upper- and lowercase by using `c & ~0x20` – CodesInChaos Jan 15 '13 at 09:50
  • @CodesInChaos: That would make it accept some invalid input, wouldn't it? – Jon Skeet Jan 15 '13 at 09:51
  • @JonSkeet I doubt it. The only characters that will be mapped to `'A'...'F'` are `'A'...'F'` and `'a'...'f'` – CodesInChaos Jan 15 '13 at 09:57
  • @CodesInChaos: Ah, I'm with you (I think). Yup, will modify that. – Jon Skeet Jan 15 '13 at 10:09
  • @JonSkeet: Actually there's a faster ParseNybble possible based on a simple table lookup. In my experience it's always a good idea to remove if-then-else constructions if you're going for performance. Basically what you do is fill an array with 64K -1's and fill in all the chars. Then you can implement ParseNybble as: if ((r = lookup[(ushort)c]) >= 0) { return r; } – atlaste Jan 15 '13 at 18:18
  • @StefandeBruijn: I would probably restrict it by only going as far as 'f', but potentially still use the lookup. You need two comparisons then, but the difference between 64K (which is enormous and cache-busting) and ~100 bytes is significant IMO. – Jon Skeet Jan 15 '13 at 18:32
  • @JonSkeet well actually it is just as fast, contrary of what I'm used to in C++... at least that's what the tests tell me. But even in our world of cheap gigabytes, it does seem like a kind of a waste though. We agree. – atlaste Jan 15 '13 at 20:14
  • Looks like [revision 7](http://stackoverflow.com/posts/14332574/revisions) introduced a subtle bug - it's comparing to `'A'...'F'` (0x41...0x46) but subtracting `'a'` (0x61) (whereas it was subtracting `'A'` before the edit)... – johnny Jun 20 '13 at 16:26
  • Are you missing a `default:` before the throw exception in the switch statement? – Henry Ing-Simmons Dec 08 '14 at 11:20
  • @JonSkeet Jon , To be picky , wouldn't it be better just to "upper case" it ? ( without checking the condition...) I mean : the checking itself converts both `'c'` and `'0'` and `'9'` to the int representation and ALSO checks if it found in an interval , whereas setting the bits(`And` & `not`) is much quicker low level operation....don't you think ? ( again ... just to be picky :-;) – Royi Namir Mar 03 '15 at 07:50
  • @RoyiNamir: I'm not sure which bit of code you're talking about, or which "without checking the condition" you're talking about. Given that there are several conditions in my answer, please be very specific. – Jon Skeet Mar 03 '15 at 08:22
  • @JonSkeet I was referring to [this](http://i.imgur.com/suYA19X.png) – Royi Namir Mar 03 '15 at 08:23
  • @RoyiNamir: But without the '0'-'9' check, you'd end up with `'0'` mapping to an integer value of 16... so when you subtract 'A' you'd end up with a negative result. Basically, you need to subtract different offsets for digits than for letters. – Jon Skeet Mar 03 '15 at 08:26
  • Yes silly me. was looking at that particular lines and not the whole method. – Royi Namir Mar 03 '15 at 08:29
  • p.s. i'm doing some measurements and _c = (char) c & ~0x20;_ is not compiling. (int to char) – Royi Namir Mar 10 '15 at 10:31
  • @RoyiNamir: Fixed. Just add brackets... – Jon Skeet Mar 10 '15 at 10:34
  • In the switch..case version of ParseNybble, the last line (return c) is unreachable as there's already a default – Sudhanshu Mishra Jul 13 '15 at 02:08
  • @JonSkeet Please can you kindly provide the equivalent code in VB.net. (*Code converters are no help*). My attempts either don't compile or introduce type conversion thus killing all performance gains. see http://stackoverflow.com/q/32074624/44080 Thanks. – Charles Okwuagwu Aug 18 '15 at 14:02
5

As a variant of Jon's if based ParseNybble:

public static byte[] ParseHex(string hexString)
{
    if ((hexString.Length & 1) != 0)
    {
        throw new ArgumentException("Input must have even number of characters");
    }
    byte[] ret = new byte[hexString.Length / 2];
    for (int i = 0; i < ret.Length; i++)
    {
        int high = ParseNybble(hexString[i*2]);
        int low = ParseNybble(hexString[i*2+1]);
        ret[i] = (byte) ((high << 4) | low);
    }

    return ret;
}

private static int ParseNybble(char c)
{
    unchecked
    {
        uint i = (uint)(c - '0');
        if(i < 10)
            return (int)i;
        i = ((uint)c & ~0x20u) - 'A';
        if(i < 6)
            return (int)i+10;
        throw new ArgumentException("Invalid nybble: " + c);
    }
}
CodesInChaos
  • 106,488
  • 23
  • 218
  • 262
5

I took the benchmarking code from the other question, and reworked it to test the hex to bytes methods given here:

HexToBytesJon: 36979.7 average ticks (over 150 runs)
HexToBytesJon2: 35886.4 average ticks (over 150 runs)
HexToBytesJonCiC: 31230.2 average ticks (over 150 runs)
HexToBytesJase: 15359.1 average ticks (over 150 runs)

HexToBytesJon is Jon's first version, and HexToBytesJon2 is the second variant. HexToBytesJonCiC is Jon's version with CodesInChaos's suggested code. HexToBytesJase is my attempt, based on both the above but with alternative nybble conversion which eschews error checking, and branching:

    public static byte[] HexToBytesJase(string hexString)
    {
        if ((hexString.Length & 1) != 0)
        {
            throw new ArgumentException("Input must have even number of characters");
        }
        byte[] ret = new byte[hexString.Length/2];
        for (int i = 0; i < ret.Length; i++)
        {
            int high = hexString[i*2];
            int low = hexString[i*2+1];
            high = (high & 0xf) + ((high & 0x40) >> 6) * 9;
            low = (low & 0xf) + ((low & 0x40) >> 6) * 9;

            ret[i] = (byte)((high << 4) | low);
        }

        return ret;
    }
Community
  • 1
  • 1
JasonD
  • 16,464
  • 2
  • 29
  • 44
  • Weird, in my tests, my code is faster than Jon's, and Jon's two solutions have essentially the same speed. Are you running without debugger attached? – CodesInChaos Jan 15 '13 at 10:23
  • Oops, I was running release, but with debugger. I'll amend the results! – JasonD Jan 15 '13 at 10:25
  • I also changed `ParseHex` a bit, removing `j` and changing the termination condition of the loop. – CodesInChaos Jan 15 '13 at 10:27
  • Yes, I've taken your updated code, it seems roughly the same, but I'll update the numbers. – JasonD Jan 15 '13 at 10:30
  • 1
    As far as I can tell, this doesn't detect invalid data. That may or may not be required by the OP, but I suspect it's relevant to performance. (There are far fewer branches in this code.) – Jon Skeet Jan 21 '13 at 07:07
  • @JonSkeet correct - I did point that out in the post, in case error checking was actually important (and it may be better to checksum or hash larger chunks, rather than erroring on every byte). FWIW I benchmarked the others with the error/exception handling removed, and IIRC they were somewhere in the middle. – JasonD Jan 21 '13 at 07:33
  • @JasonD The hi and lo computations remove the need for a nybble lookup. Please do you have the reverse, to simplify and possibly speed up the byte2hex solution? – Charles Okwuagwu Aug 21 '15 at 13:58