3

There are many how-to's on how to create Guid's that are Sql server index friendly, for example this tutorial. Another popular method is the one (listed below) from the NHibernate implementation. So I thought it could be fun to write a test method that actually tested the sequential requirements of such code. But I fail - I don't know what makes a good Sql server sequence. I can't figure out how they are ordered.

For example, given the two different way to create a sequential guid, how to determine which is the best (other than speed)? For example it looks like both have the disadvantage that if their clock is set back 2 minutes (e.g. timeserver update) they sequences are suddenly broken? But would that also mean trouble for the Sql sever index?

I use this code to produce the sequential Guid:

public static Guid CombFromArticle()
{
   var randomBytes = Guid.NewGuid().ToByteArray();
   byte[] timestampBytes = BitConverter.GetBytes(DateTime.Now.Ticks / 10000L);

   if (BitConverter.IsLittleEndian)
      Array.Reverse(timestampBytes);

   var guidBytes = new byte[16];

   Buffer.BlockCopy(randomBytes, 0, guidBytes, 0, 10);
   Buffer.BlockCopy(timestampBytes, 2, guidBytes, 10, 6);

   return new Guid(guidBytes);
}

public static Guid CombFromNHibernate()
{
  var destinationArray = Guid.NewGuid().ToByteArray();
  var time = new DateTime(0x76c, 1, 1);
  var now = DateTime.Now;
  var span = new TimeSpan(now.Ticks - time.Ticks);
  var timeOfDay = now.TimeOfDay;
  var bytes = BitConverter.GetBytes(span.Days);
  var array = BitConverter.GetBytes((long)(timeOfDay.TotalMilliseconds / 3.333333));
  Array.Reverse(bytes);
  Array.Reverse(array);
  Array.Copy(bytes, bytes.Length - 2, destinationArray, destinationArray.Length - 6, 2);
  Array.Copy(array, array.Length - 4, destinationArray, destinationArray.Length - 4, 4);
  return new Guid(destinationArray);
}

The one from the article is slightly faster but which creates the best sequence for SQL server? I could populate 1 million records and compare the fragmentation but I'm not even sure how to validate that properly. And in any case, I'd like to understand how I could write a test case that ensures the sequences are sequences as defined by Sql server!

Also I'd like some comments on these two implementations. What makes one better than the other?

Werner
  • 1,229
  • 1
  • 10
  • 24
  • Aren't you prematurely optimizing? – David Brabant Jun 27 '12 at 06:37
  • In SQL Server 2012 sequences would be capable of doing this and would most likely be more efficient than anything you wrote yourself – jaypeagi Jun 27 '12 at 06:44
  • @jaypeagi interesting thought but I'm guessing why they are using a GUID in the first place is so that it can be application generated with a fire and forget insert. using sequences would mean that the sequence would have to be queried from the source of truth (SQL). If sequences would work I'm curious why a normal identity column wouldn't work (faster!) – John Culviner Jun 27 '12 at 06:54
  • possible duplicate of [Is there a .NET equalent to SQL Servers newsequentialid()](http://stackoverflow.com/questions/211498/is-there-a-net-equalent-to-sql-servers-newsequentialid) – Kane Jun 27 '12 at 07:13
  • @Kane, I'm not asking how to implement it, I'm asking how to validate the sequence produced by various algorithms. The two examples I gave would be equivalent to Sql server (2005) but I can't find anything, anywhere on how to test that equivalence... – Werner Jun 27 '12 at 07:37
  • @jaypeagi and John: Yes what we eventually want is to create Guid's client side (from some C# algorithm). We can anticipate several million records in this system so Sql server index fragmentation is a great concern. Having a database with 50 million records with non sequential guids slows Sql server considerably down. But that just the reason behind - and really about my question where I'm trying to understand what makes a good sequence for Sql server. And ultimately how to write a test case to verify this in C#. – Werner Jun 27 '12 at 07:42
  • @David, yes definately though this being a specific algorithmic optimization issue and not related to business decisions/logic whatsoever. – Werner Jun 27 '12 at 07:45
  • 1
    I think I wrote the article you mention, and probably the biggest lesson I learned is that maintaining the sequence is only important in the big picture -- small aberrations don't really affect performance. If you set the clock back two minutes, you might see somewhat degraded performance for that time, but then again, you might not; it would depend on how the current page is populated and how many other pages have to be reshuffled. It would still be much better than a random GUID, and once the clock "caught back up" the sequence would continue increasing and things would go back to normal. – Jeremy Todd Jun 27 '12 at 17:37

1 Answers1

0

I generated sequential GUIDs for an SQL Server. I never looked at too many articles beforehand.. still, it seems sound.

The first one, I generate with a system function (to get a proper one) and the following ones, I simply increment. You have to look for overflow and the like, of course (also, a GUID has several fields).

Apart from that, nothing hard to consider. If 2 GUIDs are unique, so is a sequence of them, if.. you stay below a few million. Well, it is mathematics.. even 2 GUIDs aren't guaranteed to be unique, at least not in the long run (if humanity keeps growing). So by using this kind of sequence, you probably increase the probability of a collision from nearly 0 to nearly 0 (but slightly more). If at all.. ask a mathematician.. it is the Birthday Problem http://en.wikipedia.org/wiki/Birthday_problem , with an insane number of days.

It is in C, but that should be easily translatable to more comfortable languages. Especially, you don't neet to worry about converting the wchar to a char.

GUID guid;
bool bGuidInitialized = false;
void incrGUID()
{
    for (int i = 7; i >= 0; --i)
    {
        ++guid.Data4[i];
        if (guid.Data4[i] != 0)
            return;
    }
    ++guid.Data3;
    if (guid.Data3 != 0)
        return;
    ++guid.Data2;
    if (guid.Data2 != 0)
        return;
    ++guid.Data1;
    if (guid.Data1 != 0)
        return;
}

GenerateGUID(char *chGuid)
{
    if (!bGuidInitialized)
    {
        CoCreateGuid(&guid); 
        bGuidInitialized = true;
    }
    else
        incrGUID();

    WCHAR temp[42];
    StringFromGUID2(guid, temp, 42-1);
    wcstombs(chGuid, &(temp[1]), 42-1);
    chGuid[36] = 0;

    if (!onlyOnceLogGUIDAlreadyDone)
    {
        onlyOnceLogGUIDAlreadyDone = true;
        WR_cTools_LogTime(chGuid);
    }

    return ReturnCode;
}
Andreas Reiff
  • 7,961
  • 10
  • 50
  • 104