4

I have a database application in production, and all the tables use GUID primary keys which are currently set as the clustered indexes. I understand that this is a poor design due to performance considerations. I have been reading much on the topic, including this great article by Kimberly Tripp.

Can I improve the performance by simply creating an auto-incrementing index column of type INT and setting it to be the clustered index? I understand from Kimberly's article that all non-clustered indexes (like my GUID primary keys going forward, if I do this) will reference the clustered index. But will this actually improve performance if I'm searching for a record using the GUID primary key in the WHERE clause?

Also, will I have to populate the new column for existing records in the natural order of when the records were created in order to achieve a performance gain?

EDIT: To address whether this question is a duplicate of this other question: the other question is asking about the best practices in general regarding performance considerations for the use of a GUID primary key. No specific approaches are discussed. My question, on the other hand, is asking specifically whether adding an auto-incrementing index column of type INT will help to ameliorate the issues with a GUID primary key. Furthermore, my question then asks whether I will have to populate the new column in their "natural order" to realize the benefits, which, again, is not addressed in the other question due to its higher level of generality.

Community
  • 1
  • 1
rory.ap
  • 34,009
  • 10
  • 83
  • 174
  • Yes, by significantly decreasing fragmentation thanks to a more suitable clustering key, your performance should be better. The GUID's will be unique - so you're always only pulling one record, so even though there's an additional key lookup involved, over time, the better fragmentation behavior should be beneficial – marc_s Jan 14 '15 at 19:57
  • Possible duplicate of [What are the best practices for using a GUID as a primary key, specifically regarding performance?](http://stackoverflow.com/questions/11938044/what-are-the-best-practices-for-using-a-guid-as-a-primary-key-specifically-rega) – AHiggins Jan 07 '16 at 14:18
  • @AHiggins -- Please see my edit. – rory.ap Jan 07 '16 at 14:33

1 Answers1

3

There are a few things to consider:

  1. Yes you're correct, the clustered index keys will be present in all nonclustered indexes. Having a smaller key will help with space savings on disk and in the buffer pool.

  2. Having a clustered key of an identity will give you end of the table inserts and potentially (depending on load) make that an insertion hotspot. Where the GUIDS right now are random insert and will not give so much of a hotspot but will cause more page splits which may also adversely affect performance.

  3. To answer the question of improving performance, what is your current problem area? Have any data that we can go off of? If you don't have any problems now, it may not be worth the changes.

  4. When you add the column as an Identity it should seed itself and the order really shouldn't matter.

  5. If you do use an INT column for the key, create a unique non-clustered index on the GUID column to let the optimizer know there will only be a single value (optimization) and to allow for a quick seek. Make it covering if it isn't too expensive.

  • I would argue that an "insert hotspot" is **much less** of a performance killer than frequent page splits! Hotspots used to be a problem in the 6.5/7.0 releases - not really anymore, from what I learned. But Page Splits are extremely expensive and messy affairs - to be avoided, if ever possible! – marc_s Jan 14 '15 at 19:55
  • I have found that by dropping the fill factor on GUID indexes can alleviate the problem of page splits to some extent especially on large tables. Dropping the fill factor will reserve more space in the pages however you indexes starts getting bigger. In these cases I would suggest that you might create a composite natural key as the clustered index. This will sort the table into a natural order as well. – Namphibian Jan 14 '15 at 20:35
  • @marc_s insertion hotspots lead to latch contention which can definitely lower performance. Page splits are worse, for sure, but something to think about. SQLCAT has a partitioned hash table to split up hotspots but has it's own problems. –  Jan 14 '15 at 23:47
  • @Namphibian - You can absolutely do that, and lower the pad_index value as well so that page splits aren't quite as expensive but having to split the root and intermediate pages as often. The problem with that is you're going to be wasting all that space not only on disk but in the buffer pool as well. Ramping up IO costs. –  Jan 14 '15 at 23:47
  • @SeanGallardy What is the `hash partitioning` problem you mentioned above? Any references – LCJ Sep 21 '16 at 14:33
  • 1
    @Lijo Nothing in text, the main issue I found at different databases was the function taking down performance of DML due to complexity (overhead involved). Your mileage will most certainly vary in each specific application and situation and your app should be designed to lookup based on that hash or you're getting virtually 0 benefit of partitioning. –  Sep 22 '16 at 01:55