3

I am attempting to emulate the way a C# application converts a UUID to a Base64 value. For some reason, I can get part of the string to match the expected value, but not the whole string.

The C# code I was given:

public static string ToShortGuid(this Guid newGuid) {
string modifiedBase64 = Convert.ToBase64String(newGuid.ToByteArray())
.Replace('+', '-').Replace('/', '_') // avoid invalid URL characters
.Substring(0, 22);
return modifiedBase64;
}

What I have tried in Python 3.6:

import uuid
import base64

encode_str = = base64.urlsafe_b64encode(uuid.UUID("fa190535-6b00-4452-8ab1-319c73082b60").bytes)
print(encode_str)

"fa190535-6b00-4452-8ab1-319c73082b60" is a known UUID, and the application apparently uses the above c# code to generate a 'ShortGuid' value of "NQUZ-gBrUkSKsTGccwgrYA".

When I process the same UUID through my Python code, I get: "-hkFNWsARFKKsTGccwgrYA=="

From both of these output strings, this part matches: "KsTGccwgrYA", but the rest doesn't.

melpomene
  • 84,125
  • 8
  • 85
  • 148
MidnightThoughtful
  • 233
  • 1
  • 5
  • 13
  • 2
    Decoding `NQUZ-gBrUkSKsTGccwgrYA==` with `urlsafe_b64decode` gives me `b'5\x05\x19\xfa\x00kRD\x8a\xb11\x9cs\x08+\`'`, but `UUID("fa190535-6b00-4452-8ab1-319c73082b60").bytes` is `b'\xfa\x19\x055k\x00DR\x8a\xb11\x9cs\x08+\`'` which is different. Are you ***sure*** the UUIDs are identical in C# and python? – Aran-Fey Jun 30 '18 at 12:15
  • What I[m reasonably sure is that "fa190535-6b00-4452-8ab1-319c73082b60" is the starting UUID that goes into the c# function, and that "NQUZ-gBrUkSKsTGccwgrYA" is what it returns. The python code above is just my attempt to get to the same return value from the same starting value, and as you can see, the string from the python code partially matches. – MidnightThoughtful Jun 30 '18 at 12:19
  • Possible duplicate of [Why does Guid.ToByteArray() order the bytes the way it does?](https://stackoverflow.com/questions/9195551/why-does-guid-tobytearray-order-the-bytes-the-way-it-does) – melpomene Jun 30 '18 at 12:20
  • 1
    @melpomene I wouldn't consider that a dupe. Surely an answer to this question should include a solution that makes the generated UUIDs identical? – Aran-Fey Jun 30 '18 at 12:25
  • Agree with @Aran-Fey - the linked item is helpful to the question, but I don't think is a duplicate. – MidnightThoughtful Jun 30 '18 at 12:27

2 Answers2

8

NQUZ-gBrUkSKsTGccwgrYA corresponds to a byte sequence of 350519fa006b52448ab1319c73082b60.

If we add - in the appropriate locations, we get:

 350519fa-006b-5244-8ab1-319c73082b60
#   \/     \/   \/
#   /\     /\   /\
 fa190535-6b00-4452-8ab1-319c73082b60

Compared to the known UUID you started with, the bytes are the same, but the order within the first 3 subgroups is reversed.

To emulate what .NET does, you need to use UUID.bytes_le:

The UUID as a 16-byte string (with time_low, time_mid, and time_hi_version in little-endian byte order).

See also Why does Guid.ToByteArray() order the bytes the way it does?

melpomene
  • 84,125
  • 8
  • 85
  • 148
  • 1
    Indeed! This byte swapping is documented by Microsoft at https://msdn.microsoft.com/en-us/library/system.guid.tobytearray(v=vs.110).aspx - *"Note that the order of bytes in the returned byte array is different from the string representation of a Guid value. The order of the beginning four-byte group and the next two two-byte groups is reversed, whereas the order of the last two-byte group and the closing six-byte group is the same."* – John Zwinck Jun 30 '18 at 12:22
  • Interesting, how do I correct for that in python? (I need to do this in both directions -- from UUID to converted value, from converted value back to UUID) – MidnightThoughtful Jun 30 '18 at 12:24
6

You need to use bytes_le to get the endianness to match Microsoft's:

base64.urlsafe_b64encode(uuid.UUID("fa190535-6b00-4452-8ab1-319c73082b60").bytes_le)

That gives b'NQUZ-gBrUkSKsTGccwgrYA=='.

Aran-Fey
  • 39,665
  • 11
  • 104
  • 149
John Zwinck
  • 239,568
  • 38
  • 324
  • 436