0

I have the following situation: I'm creating a database that will work locally, but can be migrate to a online database that have the same schema and will already have some data. For this reason, I'm planning to use GUIDs as the primary keys for it's tables, as it will make migrations easier by avoiding the need to change primary key values (and cascading it to referencing tables) that would occur if I use any identity column (am I right?). The database is build on SQL Server 2012.

Despite this, I would like to have a clustered index to speed up queries on the database tables, and as I read many articles and Stack Overflow answers, I'm convinced that having a clustered index on a GUID primary key is not a good idea. So, the question is: is there any good solution to have a clustered index on this database tables? I'm open to removing the GUIDs and adding another data type. For a second I thought of using a HiLo approach, but I think that it would make the migration harder, which is an option, but I need a really good reason to choose it (namelly, if there is no good way to use a GUID ad PK and speed up queries).

Until now I thought of these solutions:

  • Using newsequentialid() to generate the new GUIDs, making them sequential and better as a clustered index. The question here is: is this really any better than using a int primary key? The main reason here to use GUIDs is to help with any migration that will eventually happen, and I think that making them sequencial would not help here. Plus, I would have to retrieve the db generated GUID instead of create it on the client side.
  • Using a COMB GUID and making it also the clustered index. It seems to solve some problems that exist in the previous approach, while keeping a high "randomness factor", but I don't know if there are any outcomes. Is there?
  • Adding a identity int column with and using it as the clustered index. My question here is if it will help on anything, since the queries will be made with the PK value anyway, which is a GUID.
  • Slightly chance the previous approach, turning the identity int into the primary key value and clustered index, and adding a GUID collumn with which I would make the queries. I see that on Adventure Works 2012 this is the choice, but I don't really understand where it helps... will the clustered index help if I query the values with the GUID, which is not the PK nor clustered?

I think thats all that I could come up with, and I'm really inclined to choose the COMB approach, and thinking on some experiments to validate if and why it's better. But is there a better solution in this scenario?

Nenad Zivkovic
  • 18,221
  • 6
  • 42
  • 55
Liordino Neto
  • 59
  • 1
  • 13
  • You can use GUID as non-clustered PK, and create clustered index on whatever column(s) are most likely to be used in `WHERE` clause of your queries - ie. some logical key. – Nenad Zivkovic Aug 23 '13 at 13:20
  • 1
    ...or cluster on something that is always increasing, to avoid page splits, like a datetime column (or adding an identity column you don't care that much about). You could also consider just using an IDENTITY instead of a wide GUID in the first place; there are ways to "eventually" migrate IDENTITY data too. – Aaron Bertrand Aug 23 '13 at 13:29
  • @NenadZivkovic Sometimes I'll gave to query for the GUID itself, and I forgot to say that I'll have datetimes on these tables, and these datetimes (coupled with some GUIDs as FKs on `JOIN`) should be a part of the most common queries, in the `WHERE` part. In this scenario should I use the datetimes as clustered indexes even thought the GUIDs will be used as FKs? – Liordino Neto Aug 23 '13 at 13:30
  • @AaronBertrand yes, that is an option too. But what are the usual approachs to migrate the identity columns? The only thing that I could thought is to add the records with the identity int PK, retrieve the db generated identity and cascade it to the referencing tables when inserting them. – Liordino Neto Aug 23 '13 at 13:32
  • Yes, when you insert a single row you can use SCOPE_IDENTITY() or when you insert multiple rows you can use the OUTPUT clause. When you migrate the data (if you ever migrate the data) you can just start the identity values where they left off (the existing values will all still match) and use whatever technique is available on the destination platform (SEQUENCE in Oracle, AUTOINCREMENT in MySQL, etc). However are you really going to migrate to a different platform? I find that most people who worry about this tend to never migrate anywhere. – Aaron Bertrand Aug 23 '13 at 13:35
  • @AaronBertrand no, I'm not migrating to a different platform, but instead to another database, with the same schema and already with some data (from other sources), but still with SQL Server 2012. The only worry here is that, since there will exist some data in the destination database I could not keep the primary key values of the source, and would have to change its value to the next identity, cascading it to the referencing tables. Still, I think it's an reasonable option (using the datetimes as clustered indexes). – Liordino Neto Aug 23 '13 at 13:50
  • 1
    You could still use IDENTITY values, just make them exclusive ranges. E.g. if you are starting from scratch on the new system, start at 1 billion, then there is no chance to overlap with the other system that started at 1 (unless you really do plan to consume more than 2 billion values, in which case use a BIGINT and spread it larger). After all, do you care whether the identity values start from 1 or not? It's not like users will have an easier time remembering GUIDs - in theory they shouldn't be exposed to users either. – Aaron Bertrand Aug 23 '13 at 13:56
  • I think you're right. The question that remains is if the int would be any better then the GUID on this respect, since my clustered index would be another column, and with the GUID I don't have to control the range size. Another approach that I'm reading [right now](https://groups.google.com/forum/?hl=en#!msg/microsoft.public.sqlserver.programming/qtCRhLLM9Kk/tg9vDfjbYW0J) is to create a compound of the logical columns on the table and make it the primary key, which would be shorter and reconstructable at any time. Maybe it could even be the clustered index, but I'm not sure of it. – Liordino Neto Aug 23 '13 at 14:10
  • Also, I'm not sure if I can guarantee that a primary key created this way is unique across many databases, or even in the same database (GUID is not really a 100% guarantee either, but at least the chances are minimal). I think that it would depend on the method to create it. – Liordino Neto Aug 23 '13 at 14:22

1 Answers1

1

// Plus, I would have to retrieve the db generated GUID instead of create it on the client side.//

((( Note, I wanted to put this as a comment, but the "code" was too big for a simple comment. But this ~could be an answer. )))

You can create a client side (mostly) sequential guid..... As more of a C# developer than a DBA, I like to create all my relationships outside the database, and then send everything down as Xml, let TSQL shred it, and then Upsert everything I need in one db hit (an occasional delete as well I guess). But there are shortcoming to this approach. I think int and bigint will usually "win" the performance contests......BUT........most of the time.........using guid with the client side "sequential" is plenty good enough for my needs. Especially, as you suggest...you have to move that data at some point. Getting PK/FK's to "lineup" using IDENTITY() is doable, but easier with GUID's (IMHO). Now, if I were writing T~ick3t M@ster or an Am@zing online retail website, I would not pick GUID.

Good luck.

using System;
using System.Runtime.InteropServices;


namespace MyCompany.MyTechnology.Framework.CrossDomain.GuidExtend
{
    public static class Guid
    {

        /*

        Original Reference for Code:
        http://www.pinvoke.net/default.aspx/rpcrt4/UuidCreateSequential.html

        */


        [DllImport("rpcrt4.dll", SetLastError = true)]
        static extern int UuidCreateSequential(out System.Guid guid);

        public static System.Guid NewGuid()
        {
            return CreateSequentialUuid();
        }


        public static System.Guid CreateSequentialUuid()
        {
            const int RPC_S_OK = 0;
            System.Guid g;
            int hr = UuidCreateSequential(out g);
            if (hr != RPC_S_OK)
                throw new ApplicationException("UuidCreateSequential failed: " + hr);
            return g;
        }


        /*

        Text From URL above:

        UuidCreateSequential (rpcrt4)

        Type a page name and press Enter. You'll jump to the page if it exists, or you can create it if it doesn't.
        To create a page in a module other than rpcrt4, prefix the name with the module name and a period.
        . Summary
        Creates a new UUID 
        C# Signature:
        [DllImport("rpcrt4.dll", SetLastError=true)]
        static extern int UuidCreateSequential(out Guid guid);


        VB Signature:
        Declare Function UuidCreateSequential Lib "rpcrt4.dll" (ByRef id As Guid) As Integer


        User-Defined Types:
        None.

        Notes:
        Microsoft changed the UuidCreate function so it no longer uses the machine's MAC address as part of the UUID. Since CoCreateGuid calls UuidCreate to get its GUID, its output also changed. If you still like the GUIDs to be generated in sequential order (helpful for keeping a related group of GUIDs together in the system registry), you can use the UuidCreateSequential function.

        CoCreateGuid generates random-looking GUIDs like these:

        92E60A8A-2A99-4F53-9A71-AC69BD7E4D75
        BB88FD63-DAC2-4B15-8ADF-1D502E64B92F
        28F8800C-C804-4F0F-B6F1-24BFC4D4EE80
        EBD133A6-6CF3-4ADA-B723-A8177B70D268
        B10A35C0-F012-4EC1-9D24-3CC91D2B7122



        UuidCreateSequential generates sequential GUIDs like these:

        19F287B4-8830-11D9-8BFC-000CF1ADC5B7
        19F287B5-8830-11D9-8BFC-000CF1ADC5B7
        19F287B6-8830-11D9-8BFC-000CF1ADC5B7
        19F287B7-8830-11D9-8BFC-000CF1ADC5B7
        19F287B8-8830-11D9-8BFC-000CF1ADC5B7



        Here is a summary of the differences in the output of UuidCreateSequential:

        The last six bytes reveal your MAC address 
        Several GUIDs generated in a row are sequential 
        Tips & Tricks:
        Please add some!

        Sample Code in C#:
        static Guid UuidCreateSequential()
        {
           const int RPC_S_OK = 0;
           Guid g;
           int hr = UuidCreateSequential(out g);
           if (hr != RPC_S_OK)
             throw new ApplicationException
               ("UuidCreateSequential failed: " + hr);
           return g;
        }



        Sample Code in VB:
        Sub Main()
           Dim myId As Guid
           Dim code As Integer
           code = UuidCreateSequential(myId)
           If code <> 0 Then
             Console.WriteLine("UuidCreateSequential failed: {0}", code)
           Else
             Console.WriteLine(myId)
           End If
        End Sub




        */








    }
}
granadaCoder
  • 26,328
  • 10
  • 113
  • 146
  • "Now, if I were writing T~ick3t M@ster or an Am@zing online retail website, I would not pick GUID." <- I think it's thanks to performance concerns... am I right? I also prefer creating the relationships outside the database. – Liordino Neto Aug 23 '13 at 14:42
  • This is kinda on par with "Should I use an ORM product?" (Like NHibernate or EntityFramework). For ~most applications, ORM "chattiness" is ok. But sometimes ORM chattiness is a performance killer. Kinda the same thing here. What is your most important aspect? I mean, seriously, are you actually writing an AMAZing ONline retailer? Probably not. If you want max performance, use int/bigint. But you'll pay the price of dealing with IDENTITIES are you move db to db. I've done just fine using Guid....and finding an alternative to Guid.NewGuid() (C#) and/or NewId() (tsql). – granadaCoder Aug 23 '13 at 14:59
  • You are absolutely right, what I'm writing is nowhere near an AMAZing ONline retailer. I think that the discussion here clarified the question for me. Thanks! :) – Liordino Neto Aug 23 '13 at 16:32
  • Some food for thought here: http://www.informit.com/articles/printerfriendly.aspx?p=25862 – Liordino Neto Aug 23 '13 at 17:35
  • Good reference Liordino. I'd like to see an update to include a sequential uuid. //You know what a nightmare it can be when doing a manual merge between two tables with INTEGERs as primary keys. Not only do you have to create a new sequence for the union of both tables, but you also must change all the foreign keys for all the dependent tables. If GUIDs are used, this sort of problem does not arise.// That's the big one for me! – granadaCoder Aug 23 '13 at 18:15
  • That's exactly what I'm trying to avoid! And I think I found my solution with the COMB implementation on the client side [here](http://stackoverflow.com/a/2187898/1262628). – Liordino Neto Aug 23 '13 at 18:34