3

I've been handed a task to improve query performance on a table.

The primary key is a GUID that is created by application code, so is not sequential, and there is no separate sequential clustering key on the table.

My feeling is that it is the choice of a non-sequential GUID as the primary and clustering key that is the main culprit for poor performance. I intend to drop the clustered index on the GUID and add an INT IDENTITY as the clustering key.

The table has ~3 million rows in it.

Is it better to try and alter the table, or to create a new table, copy the existing data to it, remove the old table, and rename the new table?

EDIT: Copying 3 million rows is taking a very long time. Would dropping the index be quicker?

EDIT 2: Decided to solve the slow copying problem with hardware, and threw 20 cores at it instead of 4. It's a lot quicker now, though still far slower than i'd expect. I'd estimate it will take 30 minutes to copy the 3 million rows.

Would still appreciate a solution though as this is just a test, i still have to do it on the production server, i'd i'd rather not have it down for longer than necessary.

For info, based on advice by @ughai, my autogrowth settings are now at 500Mb.

This kind of thing is not really my specialty so would appreciate some advice as to what is the best way of going around this.

If it's relevant, the bulk of the queries that take place on this table have no joins.

EDIT: Original Table Schema

 CREATE TABLE [dbo].[IODBTaskHistory](
    [Id] [uniqueidentifier] NOT NULL,
    [Tag] [nvarchar](250) NULL,
    [Type] [int] NOT NULL,
    [SourceFilePath] [nvarchar](max) NOT NULL,
    [DestinationFilePath] [nvarchar](max) NULL,
    [Priority] [int] NOT NULL,
    [State] [int] NOT NULL,
    [SubState] [int] NOT NULL,
    [StateDescription] [nvarchar](max) NULL,
    [Progress] [decimal](5, 2) NOT NULL,
    [Date_Created] [datetime] NOT NULL,
    [Date_Queued] [datetime] NULL,
    [Date_Started] [datetime] NULL,
    [Date_Finished] [datetime] NULL,
    [Date_LastUpdated] [datetime] NULL,
    [Optional_ParentDependancyTaskId] [uniqueidentifier] NULL,
    [Optional_isParentSuccessRequired] [bit] NULL,
    [Transfer_ProgressBytes] [float] NULL,
    [Transfer_SpeedCurrentBps] [float] NULL,
    [Transfer_SpeedIntervals] [nvarchar](max) NULL,
    [IODrone_Id] [uniqueidentifier] NULL,
    [IODrone_Version] [nvarchar](max) NULL,
    [Action] [int] NOT NULL,
    [Date_TransferStarted] [datetime] NULL,
    [Optional_NotificationEmails] [nvarchar](max) NULL,
    [MaxRetryCount] [int] NULL,
    [CurrentRetryCount] [int] NULL,
    [Impersonation_Username] [nvarchar](200) NOT NULL,
    [Impersonation_Password] [nvarchar](max) NOT NULL,
    [AllowRewrite] [bit] NOT NULL CONSTRAINT [DF_IODBTaskHistory_AllowRewrite]  DEFAULT ((0)),
    [SubTag] [nvarchar](255) NULL,
    [SourceLengthBytes] [bigint] NULL CONSTRAINT [DF_IODBTaskHistory_SourceLengthBytes2]  DEFAULT ((0)),
    [IODrone_Thread] [int] NULL,
    [Date_FileSizeFetched] [datetime] NULL,
    [Date_StornextTapeRetrievalStarted] [datetime] NULL,
    [Date_StornextTapeRetrievalFinished] [datetime] NULL,
    [IOServiceAddress] [nvarchar](20) NULL,
    [LogString] [nvarchar](max) NULL,
    [NotesString] [nvarchar](max) NULL,
    [TX_Date] [datetime] NULL,
    [SlowDownUpload] [bit] NULL CONSTRAINT [DF_IODBTaskHistory_SlowDownUpload]  DEFAULT ((0)),
 CONSTRAINT [PK_IODBTaskHistory] PRIMARY KEY CLUSTERED 
(
    [Id] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]

Destination Table Schema

CREATE TABLE [dbo].[IODBTaskHistoryNew](
    [Id] [uniqueidentifier] NOT NULL,
    [ClusterKey] [int] IDENTITY(1,1) NOT NULL,
    [Tag] [nvarchar](250) NULL,
    [Type] [int] NOT NULL,
    [SourceFilePath] [nvarchar](max) NOT NULL,
    [DestinationFilePath] [nvarchar](max) NULL,
    [Priority] [int] NOT NULL,
    [State] [int] NOT NULL,
    [SubState] [int] NOT NULL,
    [StateDescription] [nvarchar](max) NULL,
    [Progress] [decimal](5, 2) NOT NULL,
    [Date_Created] [datetime] NOT NULL,
    [Date_Queued] [datetime] NULL,
    [Date_Started] [datetime] NULL,
    [Date_Finished] [datetime] NULL,
    [Date_LastUpdated] [datetime] NULL,
    [Optional_ParentDependancyTaskId] [uniqueidentifier] NULL,
    [Optional_isParentSuccessRequired] [bit] NULL,
    [Transfer_ProgressBytes] [float] NULL,
    [Transfer_SpeedCurrentBps] [float] NULL,
    [Transfer_SpeedIntervals] [nvarchar](max) NULL,
    [IODrone_Id] [uniqueidentifier] NULL,
    [IODrone_Version] [nvarchar](max) NULL,
    [Action] [int] NOT NULL,
    [Date_TransferStarted] [datetime] NULL,
    [Optional_NotificationEmails] [nvarchar](max) NULL,
    [MaxRetryCount] [int] NULL,
    [CurrentRetryCount] [int] NULL,
    [Impersonation_Username] [nvarchar](200) NOT NULL,
    [Impersonation_Password] [nvarchar](max) NOT NULL,
    [AllowRewrite] [bit] NOT NULL,
    [SubTag] [nvarchar](255) NULL,
    [SourceLengthBytes] [bigint] NULL,
    [IODrone_Thread] [int] NULL,
    [Date_FileSizeFetched] [datetime] NULL,
    [Date_StornextTapeRetrievalStarted] [datetime] NULL,
    [Date_StornextTapeRetrievalFinished] [datetime] NULL,
    [IOServiceAddress] [nvarchar](20) NULL,
    [LogString] [nvarchar](max) NULL,
    [NotesString] [nvarchar](max) NULL,
    [TX_Date] [datetime] NULL,
    [SlowDownUpload] [bit] NULL,
PRIMARY KEY NONCLUSTERED 
(
    [Id] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY],
UNIQUE CLUSTERED 
(
    [ClusterKey] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]

GO

ALTER TABLE [dbo].[IODBTaskHistoryNew] ADD  CONSTRAINT [DF_IODBTaskHistory_AllowRewriteNew]  DEFAULT ((0)) FOR [AllowRewrite]
GO

ALTER TABLE [dbo].[IODBTaskHistoryNew] ADD  CONSTRAINT [DF_IODBTaskHistory_SourceLengthBytes2New]  DEFAULT ((0)) FOR [SourceLengthBytes]
GO

ALTER TABLE [dbo].[IODBTaskHistoryNew] ADD  CONSTRAINT [DF_IODBTaskHistory_SlowDownUploadNew]  DEFAULT ((0)) FOR [SlowDownUpload]
GO

MY Copy query

    INSERT INTO [dbo].[IODBTaskHistoryNew]
           ([Id]
           ,[Tag]
           ,[Type]
           ,[SourceFilePath]
           ,[DestinationFilePath]
           ,[Priority]
           ,[State]
           ,[SubState]
           ,[StateDescription]
           ,[Progress]
           ,[Date_Created]
           ,[Date_Queued]
           ,[Date_Started]
           ,[Date_Finished]
           ,[Date_LastUpdated]
           ,[Optional_ParentDependancyTaskId]
           ,[Optional_isParentSuccessRequired]
           ,[Transfer_ProgressBytes]
           ,[Transfer_SpeedCurrentBps]
           ,[Transfer_SpeedIntervals]
           ,[IODrone_Id]
           ,[IODrone_Version]
           ,[Action]
           ,[Date_TransferStarted]
           ,[Optional_NotificationEmails]
           ,[MaxRetryCount]
           ,[CurrentRetryCount]
           ,[Impersonation_Username]
           ,[Impersonation_Password]
           ,[AllowRewrite]
           ,[SubTag]
           ,[SourceLengthBytes]
           ,[IODrone_Thread]
           ,[Date_FileSizeFetched]
           ,[Date_StornextTapeRetrievalStarted]
           ,[Date_StornextTapeRetrievalFinished]
           ,[IOServiceAddress]
           ,[LogString]
           ,[NotesString]
           ,[TX_Date]
           ,[SlowDownUpload])
    SELECT [Id]
      ,[Tag]
      ,[Type]
      ,[SourceFilePath]
      ,[DestinationFilePath]
      ,[Priority]
      ,[State]
      ,[SubState]
      ,[StateDescription]
      ,[Progress]
      ,[Date_Created]
      ,[Date_Queued]
      ,[Date_Started]
      ,[Date_Finished]
      ,[Date_LastUpdated]
      ,[Optional_ParentDependancyTaskId]
      ,[Optional_isParentSuccessRequired]
      ,[Transfer_ProgressBytes]
      ,[Transfer_SpeedCurrentBps]
      ,[Transfer_SpeedIntervals]
      ,[IODrone_Id]
      ,[IODrone_Version]
      ,[Action]
      ,[Date_TransferStarted]
      ,[Optional_NotificationEmails]
      ,[MaxRetryCount]
      ,[CurrentRetryCount]
      ,[Impersonation_Username]
      ,[Impersonation_Password]
      ,[AllowRewrite]
      ,[SubTag]
      ,[SourceLengthBytes]
      ,[IODrone_Thread]
      ,[Date_FileSizeFetched]
      ,[Date_StornextTapeRetrievalStarted]
      ,[Date_StornextTapeRetrievalFinished]
      ,[IOServiceAddress]
      ,[LogString]
      ,[NotesString]
      ,[TX_Date]
      ,[SlowDownUpload]
  FROM [dbo].[IODBTaskHistory]

Execution Plan

enter image description here

If it's not clear form the picture 99% of the plan is spent on the clustered index insert on the new Identity column

MrBliz
  • 5,830
  • 15
  • 57
  • 81
  • 1
    how are you copying the data? using `INSERT INTO SELECT `? – ughai Apr 13 '15 at 10:54
  • 1
    I believe `INSERT INTO SELECT` should be quicker if you are not ordering the `SELECT` and there is no auto-growth. is there any DML query running in the background against this table? can you confirm there is no blocking – ughai Apr 13 '15 at 10:58
  • Not understanding your comment. I'm already using INSERT INTO SELECT, like you asked me first. – MrBliz Apr 13 '15 at 11:01
  • 1
    what I meant was if there is an `ORDER BY` in your `INSERT INTO SELECT` ? can you post your query which you are using – ughai Apr 13 '15 at 11:03
  • 1
    This is running on my local machine to test. Nothing else is being done at the moment. There is no order by in the select. Source table is 18GB of data so i'd expect it to take a a while, but not all day. AUTO-Growth settings are By 1MB, Unlimited Would SELECT INTO FROM be quicker? – MrBliz Apr 13 '15 at 11:15
  • 1
    Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/75097/discussion-between-ughai-and-mrbliz). – ughai Apr 13 '15 at 11:17
  • 1
    3 million rows is relatively small for sql server. you should try running the queries that takes a long time directly from SSMS and look at the query plan. this will provide you some data to start from. – Zohar Peled Apr 13 '15 at 11:28
  • Just to rule out the obvious: did you verify that the PK actually is defined as "CLUSTERED"? –  Apr 13 '15 at 13:10
  • Can confirm the PK is CLUSTERED – MrBliz Apr 13 '15 at 13:12
  • Normally copying rows isn't "very slow". 100k rows per second per index are a typical rate. – usr Apr 13 '15 at 13:26
  • That's what i'd expect, but i'm looking at 100,000 rows a minute. – MrBliz Apr 13 '15 at 13:27
  • The optimal strategy to execute this migration depends on the schema. The copy strategy certainly will not be excessively slow. Are you OK with 5min of downtime? This should never take longer than that. Post schema, code and execution plan. – usr Apr 13 '15 at 13:29
  • Offtopic - re "My feeling is that it is the choice of a non-sequential GUID as the primary and clustering key that is the main culprit for poor performance." - is your feeling backed by hard data? What kind of queries you perform on your table? If for example you always query by some date field/range, then consider using that date field as clustered index instead. – Arvo Apr 13 '15 at 13:51
  • There are non clustered indexes on the fields that are regularly searched. – MrBliz Apr 13 '15 at 13:55
  • The query plan is fine. Rebuild the existing CI prior to performing the migration. That way we can make sure that fragmentation does not artificially lengthen the copy. Run the query and report the speed in rows/sec. If the speed is really low we need to dig deeper. – usr Apr 13 '15 at 14:39

1 Answers1

0

In my experience the fastest way is to drop the existing clustered index

drop index index_name on tablename;

Then recreate the clustered index:

create clustered index indexname on tablename(columnname1, columnanme2);

If you are going to copy the data then copy into a target table that does not contain any indexes. After all the inserts are done then create the clustered index first and then all the other indexes. If you create the indexes before the inserts you will likely have lots, and lots, of page splits which can take a long time.

benjamin moskovits
  • 5,261
  • 1
  • 12
  • 22
  • Page splits are not going to be an issue on the clustered index since it is on an identity column, which also means that creating the clustered index *after* inserting the data does not make sense, may as well create the clustered index first and only have to write the data twice. – GarethD Apr 13 '15 at 13:08