0

Having inherited a system and addressing performance issues, noticed that for many tables (some with 1-2 million rows) the Primary Key is a GUID and new GUIDs are not set to be Sequential. Being aware of the additional fragmentation with this design, it was initially thought that we should address it to reduce system resources. We have not been able to establish if this is definitely something that should be done as we are using SSDs in a SAN disk system. Can someone advise if it is now acceptable to set up schema with such indexing if using modern disk systems? Does the old rulebook need amending?

cloudsafe
  • 2,444
  • 1
  • 8
  • 24
  • 1
    https://stackoverflow.com/a/11938495/575376 – juergen d Mar 17 '21 at 09:40
  • _"new GUIDs are not set to be Sequential"_ - can they be made sequential? – gvee Mar 17 '21 at 10:04
  • @gvee New rows are added using Entity Framework. The GUIDs are created in another system and passed to our aps to be used. So, no, we can't apply sequential. – cloudsafe Mar 17 '21 at 10:08
  • I've never used anything but uids as keys and clustered indexes.Unless your app has some very special needs your performance issues will have nothing to do with that. You need to investigate the performance issues, for example by looking at the plans of slow queries. – John Mar 17 '21 at 10:14
  • @John The theory is that if a clustered index is a guid, non-sequential rows will always cause disk fragmentation and overhead rearranging the pages on inserts. Perhaps you have an SSD disk system? – cloudsafe Mar 17 '21 at 10:19
  • 1
    @juergend The answer is from 2012. I can't remember but maybe SSD systems were not as prevalent then, so not taken into consideration when considering the answer? – cloudsafe Mar 17 '21 at 10:21
  • @cloudsafe Even without an SSD a fragmented B-tree will only appear slower for operations that scan the table. If those are the operations that make somebody have OP's performance issues, they have a more fundamental problem with scanning happening. – John Mar 17 '21 at 10:34
  • @cloudsafe, regardless of the storage subsystem, a random key will reduce buffer efficiency compared to a sequential value once the table is too large to fit into memory. Fast storage simply mitigates the impact. – Dan Guzman Mar 17 '21 at 11:01
  • @John There is an expense on page splits when inserting and this could cause longer locking. It is a heavily read/written to table. – cloudsafe Mar 17 '21 at 11:17
  • Pages will split either way, but if you have lots of concurrent writers uids will lead to a lesser chance of a writer having to wait on a page lock than if they are all trying to append with incrementing ids: In the first case, it can be any page, in the latter, all writers hit the tail page. – John Mar 17 '21 at 12:49
  • @John - there is also the case when the total size of the row exceeds the max for in-row data spaces (8060 bytes). If this is only by a couple of bytes then changing from a GUID to an INT could reduce the number of pages in use and improve buffer usage considerably (again depends on the query pattern - but let's assume ```SELECT *```). Not directly related to SSDs but to table design in general. Also partitioning could help with your scenario of a hot tail page. – Martin Cairney Mar 18 '21 at 07:13
  • @MartinCairney I don't have a hot tail page, I use uids. – John Mar 18 '21 at 09:27

0 Answers0