3

I have a function that takes as input an array of unsigned char (16 values exactly) and creates a GUID (from GUID struct) by parsing all values to hexadecimal and passing a guid-formatted string to UuidFromStringA().

My code is as follows:

GUID CreateGuid(const uint8_t* data)
{
    createGuidFromBufferData(hexValues, data);
    int uuidCreationReturnCode = UuidFromStringA((RPC_CSTR)hexValues, &guid);
    return guid;
}
            
inline void createGuidFromBufferData(char* hexValues, const uint8_t* data)
{
    decimalToHexadecimal(data[3], hexValues, 0);
    decimalToHexadecimal(data[2], hexValues, 2);
    decimalToHexadecimal(data[1], hexValues, 4);
    decimalToHexadecimal(data[0], hexValues, 6);
    hexValues[8] = '-';
    decimalToHexadecimal(data[5], hexValues, 9);
    decimalToHexadecimal(data[4], hexValues, 11);
    hexValues[13] = '-';
    decimalToHexadecimal(data[6], hexValues, 14);
    decimalToHexadecimal(data[7], hexValues, 16);
    hexValues[18] = '-';
    decimalToHexadecimal(data[8], hexValues, 19);
    decimalToHexadecimal(data[9], hexValues, 21);
    hexValues[23] = '-';
    decimalToHexadecimal(data[10], hexValues, 24);
    decimalToHexadecimal(data[11], hexValues, 26);
    decimalToHexadecimal(data[12], hexValues, 28);
    decimalToHexadecimal(data[13], hexValues, 30);
    decimalToHexadecimal(data[14], hexValues, 32);
    decimalToHexadecimal(data[15], hexValues, 34);
}

inline void decimalToHexadecimal(uint8_t decimalValue, char* outputBuffer, int currentIndex)
{
    const char hexValues[] = "0123456789abcdef";
    outputBuffer[currentIndex] = hexValues[decimalValue >> 4];
    outputBuffer[currentIndex + 1] = hexValues[decimalValue & 0xf];
}

This works fine, but I would like to do something a bit more efficient, and create the GUID directly using my input char array with something like this:

GUID CreateGuid(const uint8_t* data)
{
    GUID guid = { 
        *reinterpret_cast<const unsigned long*>(data), 
        *reinterpret_cast<const unsigned short*>(data + 4), 
        *reinterpret_cast<const unsigned short*>(data + 6), 
        *reinterpret_cast<const unsigned char*>(data + 8)
    };
    return guid;
}

When doing so, only one of the last 8 bytes is set and the rest are 0; For example using unsigned char array [38, 150, 233, 16, 43, 188, 117, 76, 187, 62, 254, 96, 109, 226, 87, 0]

when I should be getting:

10e99626-bc2b-754c-bb3e-fe606de25700

I'm getting instead:

10e99626-bc2b-75dc-bb00-000000000000}

Lundin
  • 195,001
  • 40
  • 254
  • 396
Uri Shapira
  • 369
  • 1
  • 2
  • 17

2 Answers2

3

GUID is an aggregate with a trivial copy assignment. Because of that, you should be able to do this directly.

GUID CreateGuid(const uint8_t* data)
{
    return *reinterpret_cast<GUID*>(data)
}
IWonderWhatThisAPIDoes
  • 1,000
  • 1
  • 4
  • 14
  • Isn't that UB? In C++20 you could use `std::bit_cast` instead of `reinterpret_cast`. – Timo Aug 25 '21 at 08:48
  • Also note that, although the `GUID` structure is well-designed (`long, short, short, char[]`) in regards to padding, this method is not guaranteed to work on any 'trivial' aggregate, because of potential padding between elements. – Adrian Mole Aug 25 '21 at 08:53
1

You can't fill an 8-character array simply by dereferencing a cast pointer to your source data and assigning that value to its first element, as you are attempting. As you have noticed, doing so will only assign a value to that first element.

For the 8-character array at the end of the GUID structure, you can initialise each element individually, extending the cast + offset approach:

GUID CreateGuid(const uint8_t* data)
{
    GUID guid = {
        *reinterpret_cast<const unsigned long*>(data),
        *reinterpret_cast<const unsigned short*>(data + 4),
        *reinterpret_cast<const unsigned short*>(data + 6),
        { // We need to assign each of the 8 elements of the array ...
            *reinterpret_cast<const unsigned char*>(data + 8),
            *reinterpret_cast<const unsigned char*>(data + 9),
            *reinterpret_cast<const unsigned char*>(data + 10),
            *reinterpret_cast<const unsigned char*>(data + 11),
            *reinterpret_cast<const unsigned char*>(data + 12),
            *reinterpret_cast<const unsigned char*>(data + 13),
            *reinterpret_cast<const unsigned char*>(data + 14),
            *reinterpret_cast<const unsigned char*>(data + 15),
        }
    };
    return guid;
}

Note: As pointed out in the comments (thanks, Timo), using this direct initialization approach risks invoking undefined behaviour, especially if the passed source data buffer isn't suitably aligned for the corresponding destination types, because of the Strict Aliasing Rule. To avoid this, you should use memcpy to transfer data from the source buffer to the destination structure's elements:

GUID CreateGuid(const uint8_t* data)
{
    GUID guid;
    std::memcpy(&guid.Data1, data, sizeof(long));
    std::memcpy(&guid.Data2, data + 4, sizeof(short));
    std::memcpy(&guid.Data3, data + 6, sizeof(short));
    std::memcpy(guid.Data4, data + 8, 8); // sizeof(char) == 1
    return guid;
}

Further, as the GUID structure is defined in such a way that there should be no padding bytes added between its elements, you could copy the data in one fell swoop:

GUID CreateGuid(const uint8_t* data)
{
    GUID guid;
    std::memcpy(&guid, data, sizeof(GUID)); // Assuming no padding!
    return guid;
}
Adrian Mole
  • 49,934
  • 160
  • 51
  • 83
  • Aren't those `reinterpret_cast`s UB because of the type aliasing rule? At least the non-char ones – Timo Aug 25 '21 at 08:52
  • @Timo Possibly. But, if the passed data as a whole are properly aligned (i.e. the `0` element is suitably aligned for a `long`), then there should be no issues. But you're right - using `memcpy()` rather than direct initialisation would be better/safer ... – Adrian Mole Aug 25 '21 at 08:56
  • _"if the passed source data buffer isn't suitably aligned"_. I think, strictly speaking, this doesn't matter for the UB case because you're always violating the strict aliasing rule (unless `data` points to a `GUID` object). Even `float f = 1.0; int32_t x = *reinterpret_cast(&f)` is UB according to strict aliasing, even though they have the same alignment. – Timo Aug 25 '21 at 09:26
  • 1
    @Timo Edited again ... and now I'm going for some much-needed coffee! :-) – Adrian Mole Aug 25 '21 at 09:29