2

I have a unusual situation where by I have an existing MySQL database that uses binary(16) primary keys, these are the basis for UUIDs that are used in an existing api.

My problem is that I now want to add a replacement api written with dotnet core, and I'm running into a problem with encoding that has been explained here

Specifically, the Guid struct in dotnet uses a mixed-endian format that produces a different string to the existing api. This isn't acceptable for obvious reasons.

So my question is this: is there an elegant way to force the Guid struct to encode entirely with the big-endian format?

If there isn't I can just write a terrible hack, but I thought I'd check with the collective intelligence of the SO community first!

Thomas Horrobin
  • 400
  • 2
  • 13

2 Answers2

3

Nope; as far as I'm aware there's no inbuilt way to get this. And yes, Guid has what I can only call "crazy-endian" implementation currently. You'd need to get the Guid-ordered bits (either via unsafe or Guid.ToByteArray) and then order them manually, figuring out which chunks to reverse - it isn't a simple Array.Reverse(). So: very manual, I'm afraid. I suggest using a guid like

00010203-0405-0607-0809-0a0b0c0d0e0f

to debug it; this gives you (as I suspect you are aware):

03-02-01-00-05-04-07-06-08-09-0A-0B-0C-0D-0E-0F

so:

  • reverse 4
  • reverse 2
  • reverse 2
  • straight 8
Marc Gravell
  • 1,026,079
  • 266
  • 2,566
  • 2,900
  • I was concerned this might be the case. It's frustrating because, while I don't mind doing a bit of manual coding, this is going to obscure what would otherwise be very self explanatory code. Guess I'll just have to add some lengthy comments. Thanks for the debug tips though! – Thomas Horrobin Jan 08 '18 at 10:06
1

As of 2021 2023 there still isn't a built-in way to convert a System.Guid to a MySQL compatible big endian string in C#.

Here's the extension we came up with when we encountered this exact C# mixed-endian Guid problem at work:

Original Version

public static string ToStringBigEndian(this Guid guid)
{
    // allocate enough bytes to store Guid ASCII string
    Span<byte> result = stackalloc byte[36];
    // set all bytes to 0xFF (to be able to distinguish them from real data)
    result.Fill(0xFF);
    // get bytes from guid
    Span<byte> buffer = stackalloc byte[16];
    _ = guid.TryWriteBytes(buffer);
    int skip = 0;
    // iterate over guid bytes
    for (int i = 0; i < buffer.Length; i++)
    {
        // indices 4, 6, 8 and 10 will contain a '-' delimiter character in the Guid string.
        // --> leave space for those delimiters
        if (i is 4 or 6 or 8 or 10)
        {
            skip++;
        }
        // stretch high and low bytes of every single byte into two bytes (skipping '-' delimiter characters)
        result[(2 * i) + skip] = (byte)(buffer[i] >> 0x4);
        result[(2 * i) + 1 + skip] = (byte)(buffer[i] & 0x0Fu);
    }
    // iterate over precomputed byte array.
    // values 0x0 to 0xF are final hex values, but must be mapped to ASCII characters.
    // value 0xFF is to be mapped to '-' delimiter character.
    for (int i = 0; i < result.Length; i++)
    {
        // map bytes to ASCII values (a-f will be lowercase)
        ref byte b = ref result[i];
        b = b switch
        {
            0xFF => 0x2D,                // Map 0xFF to '-' character
            < 0xA => (byte)(b + 0x30u),  // Map 0x0 - 0x9 to '0' - '9'
            _ => (byte)(b + 0x57u)       // Map 0xA - 0xF to 'a' - 'f'
        };
    }

    // get string from ASCII encoded guid byte array
    return Encoding.ASCII.GetString(result);
}

it's a bit lengthy but apart from the big endian string it returns it does no heap allocations so it's guaranteed to be fast :)

Update 2023: Faster Version

less branches => less branch mispredictions => less pipeline stalls => faster.

public static string ToStringBigEndian(this Guid guid)
{
    // allocate enough bytes to store Guid ASCII string
    Span<byte> result = stackalloc byte[36];
    // get bytes from guid
    Span<byte> buffer = stackalloc byte[16];
    _ = guid.TryWriteBytes(buffer);
    int skip = 0;
    // iterate over guid bytes
    for (int i = 0; i < buffer.Length; i++)
    {
        // indices 4, 6, 8 and 10 will contain a '-' delimiter character in the Guid string.
        // --> leave space for those delimiters
        // we can check if i is even and i / 2 is >= 2 and <= 5 to determine if we are at one of those indices
        // 0xF...F if i is odd and 0x0...0 if i is even
        int isOddMask = -(i & 1);
        // 0xF...F if i / 2 is < 2 and 0x0...0 if i / 2 is >= 2
        int less2Mask = ((i >> 1) - 2) >> 31;
        // 0xF...F if i / 2 is > 5 and 0x0...0 if i / 2 is <= 5
        int greater5Mask = ~(((i >> 1) - 6) >> 31);
        // 0xF...F if i is even and 2 <= i / 2 <= 5 otherwise 0x0...0
        int skipIndexMask = ~(isOddMask | less2Mask | greater5Mask);
        // skipIndexMask will be 0xFFFFFFFF for indices 4, 6, 8 and 10 and 0x00000000 for all other indices
        // --> skip those indices
        skip += 1 & skipIndexMask;
        result[(2 * i) + skip] = ToHexCharBranchless(buffer[i] >>> 0x4);
        result[(2 * i) + skip + 1] = ToHexCharBranchless(buffer[i] & 0x0F);
    }
    // add dashes
    const byte dash = (byte)'-';
    result[8] = result[13] = result[18] = result[23] = dash;
    // get string from ASCII encoded guid byte array
    return Encoding.ASCII.GetString(result);
}

[MethodImpl(MethodImplOptions.AggressiveInlining)]
private static byte ToHexCharBranchless(int b) =>
    // b + 0x30 for [0-9] if 0 <= b <= 9 and b + 0x30 + 0x27 for [a-f] if 10 <= b <= 15
    (byte)(b + 0x30 + (0x27 & ~((b - 0xA) >> 31)));

Benchmark results indicate a performance improvement of ~30%:

benchmark-results

Frederik Hoeft
  • 1,177
  • 1
  • 13
  • 37