3

In SQL Server 2008 I have tried to reproduce the results from the experiments on clustered index on sequential vs. non-sequential GUID keys seen here http://sqlblog.com/blogs/denis_gobo/archive/2009/02/05/11743.aspx but I do not experience the significant speedup for insertions that I would expect (and the author experiences). The page utilization is clearly improved with the sequential GUID, but for some reasons, inserting 10,000 rows is only around 100 ms faster (out of 10,300 ms).

I use the following code:

CREATE TABLE TestGuid1 (Id UNIQUEIDENTIFIER not null DEFAULT newid(),
SomeDate DATETIME, batchNumber BIGINT)

CREATE TABLE TestGuid2 (Id UNIQUEIDENTIFIER not null DEFAULT newsequentialid(),
SomeDate DATETIME, batchNumber BIGINT)

CREATE CLUSTERED INDEX ix_id1 ON TestGuid1(id)
CREATE CLUSTERED INDEX ix_id2 ON TestGuid2(id)

go

SET NOCOUNT ON
INSERT TestGuid1 (SomeDate,batchNumber) VALUES (GETDATE(),3)
go 10000

SET NOCOUNT ON
INSERT TestGuid2 (SomeDate,batchNumber) VALUES (GETDATE(),3)
go 10000

DBCC showcontig ('TestGuid1') WITH tableresults
DBCC showcontig ('TestGuid2')  WITH tableresults

SELECT batchNumber,DATEDIFF(ms,MIN(SomeDate),MAX(SomeDate))
FROM TestGuid1
GROUP BY batchNumber

SELECT batchNumber,DATEDIFF(ms,MIN(SomeDate),MAX(SomeDate))
FROM TestGuid2
GROUP BY batchNumber

Can anyone explain why I do not experience a more significant speedup for insertions on TestGuid2?

Follow-up: As requested in the thread below, I have expanded the test: the test results tend to vary significantly over time, so now the experiments are repeated N times, and the total and average time usage reported. I have also added a third test, namely for primary keys on sequential integer columns. This should be the fastest and most compact of all three methods as the integer type is smaller and IDENTITY(1,1) is (or at least should be) fast. At least by my intuition. The average execution time is now to the benefit of the sequential GUID, but surprisingly insertions in the third experiment (with sequential integer keys) is slower than sequential GUIDs. I have no explanation of this. Here is the code for the new experiments:

SET NOCOUNT ON

CREATE TABLE TestGuid1 (Id UNIQUEIDENTIFIER NOT NULL DEFAULT NEWID() PRIMARY KEY,
SomeDate DATETIME, batchNumber BIGINT, FILLER CHAR(100))

CREATE TABLE TestGuid2 (Id UNIQUEIDENTIFIER NOT NULL DEFAULT NEWSEQUENTIALID() PRIMARY KEY,
SomeDate DATETIME, batchNumber BIGINT, FILLER CHAR(100))

CREATE TABLE TestInt (Id Int NOT NULL identity(1,1) PRIMARY KEY,
SomeDate DATETIME, batchNumber BIGINT, FILLER CHAR(100))

DECLARE @BatchCounter INT = 1
DECLARE @Numrows INT = 100000


WHILE (@BatchCounter <= 20)
BEGIN 
BEGIN TRAN

DECLARE @LocalCounter INT = 0

    WHILE (@LocalCounter <= @NumRows)
    BEGIN
    INSERT TestGuid1 (SomeDate,batchNumber) VALUES (GETDATE(),@BatchCounter)
    SET @LocalCounter +=1
    END

SET @LocalCounter = 0

    WHILE (@LocalCounter <= @NumRows)
    BEGIN
    INSERT TestGuid2 (SomeDate,batchNumber) VALUES (GETDATE(),@BatchCounter)
    SET @LocalCounter +=1
    END

SET @LocalCounter = 0

    WHILE (@LocalCounter <= @NumRows)
    BEGIN
    INSERT TestInt (SomeDate,batchNumber) VALUES (GETDATE(),@BatchCounter)
    SET @LocalCounter +=1
    END

SET @BatchCounter +=1
COMMIT 
END

DBCC showcontig ('TestGuid1') WITH tableresults
DBCC showcontig ('TestGuid2')  WITH tableresults
DBCC showcontig ('TestInt')  WITH tableresults

SELECT batchNumber,DATEDIFF(ms,MIN(SomeDate),MAX(SomeDate)) AS [NEWID()]
FROM TestGuid1
GROUP BY batchNumber

SELECT batchNumber,DATEDIFF(ms,MIN(SomeDate),MAX(SomeDate)) AS [NEWSEQUENTIALID()]
FROM TestGuid2
GROUP BY batchNumber

SELECT batchNumber,DATEDIFF(ms,MIN(SomeDate),MAX(SomeDate)) AS [IDENTITY()]
FROM TestInt
GROUP BY batchNumber

DROP TABLE TestGuid1
DROP TABLE TestGuid2
DROP TABLE TestInt

And the average execution times:

NEWID()            3064
NEWSEQUENTIALID()  1977
IDENTITY()         2223

The page usage is as follows:

Table          Pages  AveragePageDensity
----------------------------------------
TestGuid1      50871  68,4
TestGuid2      35089  99,2
TestInt        32259  98,7

I fail to see, why these page statistics (which are best for TestInt) doesn't imply that experiment three is fastest.

someName
  • 1,275
  • 2
  • 14
  • 33
  • 2
    Have you tried running the inserts inside an explicit transaction? At the moment the effect of all the individual commits will likely dominate the results (BTW - important to put `BEGIN TRAN` before a `GO` so you don't end up executing that 10000 times which would need an equal number of `commits` to bring `@@TRANCOUNT` down to zero.) – Martin Smith May 12 '11 at 11:22
  • 1
    Chances are your sample size is much to small at only 10000 rows the reordering of the table takes the sql server such a short amount of time you won't see much at any one insert.Try upping it to like 100,000 or 1,000,000. When dealing with tables of that size clustered index selection is a much more meaningfully choice. – JStead May 12 '11 at 11:25
  • Kimberly Tripp nicely (and extensively!) shows in her blog post [Disk space is cheap - that's **NOT** the point!!](http://www.sqlskills.com/BLOGS/KIMBERLY/post/Disk-space-is-cheap.aspx) how bad GUID's - random or sequential - compare against using `INT IDENTITY` as a clustering index on your table – marc_s May 12 '11 at 11:26
  • Encapsulating everything in a transaction (using WHILE loop instead of "go 10000"), inserting 100,000 rows instead of 10,000 and adding a filler column of type char(100) does not change the time usage picture to the benefit of sequential GUIDs. Actually the contrary: With the above modifications the insertions for 4800 vs. 2276 ms (sequential GUIDs are surprisingly the slowest...) – someName May 12 '11 at 11:59
  • How many tests have you done? The results in my answer show each individual run can have quite wide variation but when I take an average `NEWSEQUENTIALID` comes out ahead. – Martin Smith May 12 '11 at 12:57
  • I only did one test run. This was, as I have just elaborated in the question, not enough. – someName May 13 '11 at 11:00
  • I have now expanded the test to cover a third experiment on sequential integer keys, and the result is, well, surprising. At least to me. :-) – someName May 13 '11 at 11:07

2 Answers2

3

Can you try this modified script and post your results?

    SET NOCOUNT ON

    CREATE TABLE TestGuid1 (Id UNIQUEIDENTIFIER NOT NULL DEFAULT NEWID() PRIMARY KEY,
    SomeDate DATETIME, batchNumber BIGINT, FILLER CHAR(100))

    CREATE TABLE TestGuid2 (Id UNIQUEIDENTIFIER NOT NULL DEFAULT NEWSEQUENTIALID() PRIMARY KEY,
    SomeDate DATETIME, batchNumber BIGINT, FILLER CHAR(100))

    DECLARE @BatchCounter INT = 1

    WHILE (@BatchCounter <= 20)
    BEGIN 
    BEGIN TRAN

    DECLARE @LocalCounter INT = 0

        WHILE (@LocalCounter <= 100000)
        BEGIN
        INSERT TestGuid1 (SomeDate,batchNumber) VALUES (GETDATE(),@BatchCounter)
        SET @LocalCounter +=1
        END

    SET @LocalCounter = 0

        WHILE (@LocalCounter <= 100000)
        BEGIN
        INSERT TestGuid2 (SomeDate,batchNumber) VALUES (GETDATE(),@BatchCounter)
        SET @LocalCounter +=1
        END

    SET @BatchCounter +=1
    COMMIT 
    END

    DBCC showcontig ('TestGuid1') WITH tableresults
    DBCC showcontig ('TestGuid2')  WITH tableresults

    SELECT batchNumber,DATEDIFF(ms,MIN(SomeDate),MAX(SomeDate)) AS [NEWID()]
    FROM TestGuid1
    GROUP BY batchNumber

    SELECT batchNumber,DATEDIFF(ms,MIN(SomeDate),MAX(SomeDate)) AS [NEWSEQUENTIALID()]
    FROM TestGuid2
    GROUP BY batchNumber

DROP TABLE TestGuid1
DROP TABLE TestGuid2

I see quite wildly varying results between individual runs (on my laptop not a server!) but a definite trend for sequential to be faster.

NEWID() Average 5168.9

batchNumber          NEWID()
-------------------- -----------
1                    4270
2                    2480
3                    2706
4                    3333
5                    7480
6                    5346
7                    4306
8                    7713
9                    7313
10                   4760
11                   4680
12                   4113
13                   3433
14                   2686
15                   4963
16                   8040
17                   5313
18                   8160
19                   9533
20                   2750

NEWSEQUENTIALID() Average 3000.85

batchNumber          NEWSEQUENTIALID()
-------------------- -----------------
1                    2016
2                    1820
3                    1886
4                    1870
5                    4873
6                    3473
7                    3730
8                    3690
9                    1983
10                   2020
11                   1906
12                   5596
13                   2100
14                   1950
15                   2096
16                   1876
17                   5196
18                   2110
19                   2113
20                   7713
Martin Smith
  • 438,706
  • 87
  • 741
  • 845
  • +1 - I was working on something similar but you got it first and better :) – JNK May 12 '11 at 12:59
  • It also would probably be a good thing to size the data file, growing the files might skew the results – SQLMenace May 12 '11 at 15:43
  • @SQLMenace - True that might explain some of the wide variation in timings but also that might be explained by other applications I've got running. – Martin Smith May 12 '11 at 15:49
  • Of course, when I wrote that blog post it was ran against sql 2005. Anyway, still better to use NEWSEQUENTIALID() because reads will be faster and tables will use less space – SQLMenace May 12 '11 at 15:51
2

Since I wrote that original blog post, I decided to run your code, here is what I get

3   8726  -- newid()
3   12550 -- newsequantialID 

Remember I am running this on a server with 32 GB of RAM and 8 procs, not on a laptop

on my local machine, I almost see no difference between the two

Remember, besides inserts, reads will be much slower because the table is fragmented

Here is what I get when running Martin's script on the server

batchNumber NEWID()
17  1696
19  1706
14  1680
16  1706
5   1660
6   1890
7   1650
8   1663
13  1673
15  1683
2   1656
9   1673
20  1750
1   2033
3   1673
10  1673
12  1670
4   1650
11  1690
18  1696

batchNumber NEWSEQUENTIALID()
2   1276
9   1260
20  1290
13  1266
15  1280
17  1266
19  1266
5   1260
6   1266
7   1260
8   1260
1   1243
3   1256
10  1270
12  1263
14  1266
16  1276
4   1256
11  1270
18  1270

Here is what happens on my desktop, files are not sized BTW

batchNumber NEWID()
1   9470
2   4446
3   5996
4   3860
5   4170
6   2403
7   3283
8   3573
9   1883
10  3980
11  2580
12  2780
13  1643
14  2836
15  3250
16  4303
17  3250
18  3376
19  8723
20  2616

batchNumber NEWSEQUENTIALID()
1   2566
2   1336
3   1256
4   3123
5   3023
6   1166
7   2396
8   1180
9   2386
10  3896
11  3790
12  3066
13  1396
14  2010
15  1183
16  3110
17  4060
18  4260
19  1896
20  2013
SQLMenace
  • 132,095
  • 25
  • 206
  • 225
  • Somewhat surprising! Do you get the same result if you try the code in my answer? Is `newid()` consistently faster for you on your server? – Martin Smith May 12 '11 at 15:39
  • I have now repeated the experiments and the results now (almost) fit with the theory. See my extension of the original question formulation. However, I also expanded the test to cover an even more compact and therefore (in theory) faster method: sequential **integer** keys. But to my surprise, this is slower than sequential GUIDs. I have no explanation for this. – someName May 13 '11 at 11:03