2

I am in the progress of migrating a database from Microsoft SQL Server to MySQL/MariaDB. On MSSQL, the database uses the uniqueidentifier (GUID) datatype for all primary keys. NHibernate is used to map data between database and application, and the guid.comb strategy is employed for GUID generation to avoid fragmentation of the clustered indexes.

MySQL not having a dedicated GUID data type, the new database schema uses BINARY(16) for all identifiers. Without making any changes to the NHibernate mappings, I can start our application, persist new entities and load them back up from the MySQL database. Great! However, it turns out that the sequantially generated GUIDs are ordered very much non-sequantially in the BINARY(16) column, yielding unacceptable index fragmentation.

Reading up on the issue, it turns out that MSSQL has a quite special method for sorting GUIDs. The 16 bytes are ordered first by the last six bytes, then by the precending in reversed groups, whereas my naïve MySQL implementation sorts on the first byte first, then the next and so forward.

And this leads to my question: How to avoid this fragmentation in the MySQL database, while keeping the existing GUIDs and the guid.comb strategy? I have an idea for a solution myself (posted below) but I cannot help feeling I might have missed something. Surely, others must have dealt with this issue before, and maybe there is a simple way to get around it.

Community
  • 1
  • 1
Jørn Schou-Rode
  • 37,718
  • 15
  • 88
  • 122

1 Answers1

3

As observed by Alberto Ferrari and discussed here on StackOverflow, Microsoft SQL Server sorts GUIDs by comparing the bytes in a specific order. As MySQL will sort a BINARY(16) "straight-foward", all we need to do, is to reorder the bytes when reading/writing to the database.

NHibernate allows us to define custom data types, which can be used in mappings between database and objects. I have implemented a BinaryGuidType, capable of reordering the bytes produced by Guid.ToByteArray() according to the way MSSQL sorts GUIDs and reordering them back into the format accepted by the Guid(byte[]) constructor.

The byte order looks like this:

int[] ByteOrder = new[] { 10,11,12,13,14,15,8,9,6,7,4,5,0,1,2,3 };

Saving a System.Guid to a BINARY(16) goes like this:

var bytes = ((Guid) value).ToByteArray();
var reorderedBytes = new byte[16];

for (var i = 0; i < 16; i++)
{
    reorderedBytes[i] = bytes[ByteOrder[i]];
}

NHibernateUtil.Binary.NullSafeSet(cmd, reorderedBytes, index);

Reading the bytes back into a System.Guid goes like this:

var bytes = (byte[]) NHibernateUtil.Binary.NullSafeGet(rs, names[0]);
if (bytes == null || bytes.Length == 0) return null;

var reorderedBytes = new byte[16];

for (var i = 0 ; i < 16; i++)
{
    reorderedBytes[ByteOrder[i]] = bytes[i];
}

Full source code for the BinaryGuidType here.

This seems to work well. Creating and persisting 10.000 new objects in a table, they are stored completely sequentially, with no signs of index fragmentation.

Community
  • 1
  • 1
Jørn Schou-Rode
  • 37,718
  • 15
  • 88
  • 122