4

I am in a big mess and looking to see if I can get out of it. I have a database with all primary keys on tables defined on uniqueidentifier columns. This was kind of forced on us, giving 'security' as one of the reasons. The other I guess comparatively worthier reason is that some of the tables participate in replication.

I am reviewing the database and I feel one easily avoidable future performance bottleneck is to may add auto-increment bigint columns in all tables and make them primary keys (clustered). And somehow 'attach' the pk-fk relationships properly. But still retain the old columns for any future use.

Any suggestions/comments/donots in this regard? Ours is a c#/MSSQL Server R2/Linq environment.

EDIT: Looking at the comments, I realize I left out a few important details. All primary key Guid fields are Clustered and no I am not using newsequentialId (We use Linq to SQL, Primary keys are generated client side. With replication involved, we weren't sure of a way to correctly generate sequential ids from different client environments without conflicts).

My 'feeling' is due to the known fact that clustered index on a guid column will cause high degree of fragmentation and will only worsen the situation as the database grows.

Also, I am not really trying optimize right now, but trying to correct a bad design to avoid future headaches when database gets too large. Wanted to know if it is worth doing it now.

Helpful discussion also related to the issue here in this post, and another

Community
  • 1
  • 1
Brian
  • 1,337
  • 5
  • 17
  • 34
  • 3
    `and I feel one easily avoidable performance bottleneck is to may add auto-increment bigint`, a little advice. Don't optimize on a feeling. – Mike Perrenoud Aug 28 '13 at 13:17
  • 1
    Benchmark your database. Replicate the data to another table with a `BIGINT` and see if the performance loss is really that much of an issue. – Dustin Kingen Aug 28 '13 at 13:18
  • 3
    p.s. this may be better for http://dba.stackexchange.com/ – Brad Christie Aug 28 '13 at 13:20
  • 2
    Are the guid's used as clustered indexes. Are you using sequential guid's? – Lieven Keersmaekers Aug 28 '13 at 13:20
  • Is the GUID column a clustered index? – Jedediah Aug 28 '13 at 13:20
  • @LievenKeersmaekers, good question. Does that ubiquitous 13th bit have a 1 or a 4? :D – Mike Perrenoud Aug 28 '13 at 13:21
  • 1
    I would advise to move to sequential-guid's and performance tune at that point........before unearthing the whole thing to move to bigint. http://stackoverflow.com/questions/18403632/alternatives-to-guid-for-clustered-index/18405271#18405271 – granadaCoder Aug 28 '13 at 13:39
  • @neoistheone Question edited. – Brian Aug 28 '13 at 14:13
  • 2
    I would recommend taking the advice of @neoistheone, then look to remove the clustered indexes and choose a more suitable column/set of columns to cluster on once you have done your analysis. Lots of information here as well http://stackoverflow.com/questions/277625/should-i-get-rid-of-clustered-indexes-on-guid-columns – SteveB Aug 28 '13 at 14:15
  • 1
    Deletions will cause fragmentation regardless of what your clustered index is. What you get with a non-sequential clustered index is a low average fullness from inserts alone because added rows get inserted all over, and end up splitting your pages. I have a large table that never has deletes and uses Guids for the clustered index. The average page fullness is around 63%. A sequential clustered index in the same circumstance will have page fullness that's close to 99%. But this is only one of many factors affecting performance. – hatchet - done with SOverflow Aug 28 '13 at 15:38

3 Answers3

11

Performance tuning a database is actually simple, but it's hard. First, you need to gather the list of statements that are being executed against the database by running a long-running profile over the course of at least one business day, but ideally two.

Save that profile to the database so it can be queried, and thus you can easily find DISTINCT queries that are executed against your database.

After determining the ones that execute the most, analyze their execution plans, it's likely it has nothing to do with the GUID's and everything to do with either the queries themselves (i.e. they're just terrible) or you need a different index.

Things to watch out for:

  1. Views that are heavily filtered with a WHERE clause. These are fantastic candidates for either Stored Procedures or Parameterized Views.
  2. Statements that JOIN to very large tables, those at times can be good candidates for sub-queries instead. This does depend on the execution plan.
  3. Statements that appear to be executed multiple times. This is often a good sign that the application itself just isn't doing a good job managing how often it makes round-trips to the server. I've seen applications that will run the same query 10+ times.
Mike Perrenoud
  • 66,820
  • 29
  • 157
  • 232
  • Thank you. That's very helpful. I will add these steps to my todo list when it is time for performance tuning. Btw I didn't get your comment to LievenKeersmaekers? – Brian Aug 28 '13 at 14:29
  • @Brian, read this entire series about GUID's from Eric Lippert, http://blogs.msdn.com/b/ericlippert/archive/tags/guids/, that digit let's you know if the GUID is sequential or random. Most GUID's are based off of the MAC address and therefore they are actually sequential - but it's a common misunderstanding. – Mike Perrenoud Aug 28 '13 at 14:32
2

A stopgap measure could be to use NEWSEQUENTIALID() instead of NEWID(). At least this way you won't get as much fragmentation.

Creates a GUID that is greater than any GUID previously generated by this function on a specified computer since Windows was started. After restarting Windows, the GUID can start again from a lower range, but is still globally unique. When a GUID column is used as a row identifier, using NEWSEQUENTIALID can be faster than using the NEWID function. This is because the NEWID function causes random activity and uses fewer cached data pages. Using NEWSEQUENTIALID also helps to completely fill the data and index pages.

http://technet.microsoft.com/en-us/library/ms189786.aspx

Note that this won't necessarily help resolve your performance woes depending on what they are (perhaps you can elaborate - INSERTs? SELECT filtering? etc.).

Mark Sowul
  • 10,244
  • 1
  • 45
  • 51
  • I agree. My comment (in the Original Question) has a "client side sequential uuid" solution, to go along with this suggestion. – granadaCoder Aug 28 '13 at 13:42
1

Have a look at Kimberly Tripp's advice at http://www.sqlskills.com/blogs/kimberly/disk-space-is-cheap/

She recommends the following steps for converting to a non-GUID clustering key:

If your CL key is a PK then here are your steps:

Take the database offline (sorry, I’m just the messenger!!)
Disable the FKs
Disable the nonclustered indexes
Drop the clustered PK (alter table)
Optionally, add an identity column?
Create the new clustered index
Create the PK as nonclustered
Enable the nonclustered indexes (alter index…rebuild)
Enable the FKs with CHECK (this is very important)
Bring the database online 
Hannah Vernon
  • 3,367
  • 1
  • 26
  • 48