16

Summary: I have a table populated via the following:

insert into the_table (...) select ... from some_other_table

Running the above query with no primary key on the_table is ~15x faster than running it with a primary key, and I don't understand why.

The details: I think this is best explained through code examples.

I have a table:

create table the_table (
    a int not null,
    b smallint not null,
    c tinyint not null
);

If I add a primary key, this insert query is terribly slow:

alter table the_table
    add constraint PK_the_table primary key(a, b);

-- Inserting ~880,000 rows
insert into the_table (a,b,c)
    select a,b,c from some_view;

Without the primary key, the same insert query is about 15x faster. However, after populating the_table without a primary key, I can add the primary key constraint and that only takes a few seconds. This one really makes no sense to me.

More info:

  • The estimated execution plan shows 0% total query time spent on the clustered index insert
  • SQL Server 2008 R2 Developer edition, 10.50.1600

Any ideas?

Eric
  • 316
  • 1
  • 3
  • 10
  • 6
    No revelation there. Removing all indexes, inserting all data, and then recreating indexes is often faster than simply inserting the data... – Mitch Wheat Apr 01 '11 at 04:59
  • Yes. PK in sql-server is indexed either clustered or non-clustered. Thus, as Mitch indicates, it'll be much faster that insertion without any index and then recreating it. – RollingBoy Apr 01 '11 at 05:12
  • It is amazing how fast the inserts are without any uniqueness constraints especially if you are doing inserts one row at a time. – Thomas Apr 01 '11 at 05:25
  • 1
    for the record: what you are seeing is the housekeeping of the index taking time. For every insert, the index also has to be updated. Removing it and adding it afterwards eliminates that. Indexes aren't free. They do speed up searches but slow down inserts. Updates & deletes kind of depend on the specific statement. – Lieven Keersmaekers Apr 01 '11 at 06:18
  • @marc_s, the question was why the PK was so seriously affecting performance. 15x difference seemed to steep to me. I wasn't clear in specifically stating the question, you're right. – Eric Apr 01 '11 at 16:48
  • Always use an auto-incremented column as the pk. even if you don't use it. And use non-clustered index for what you wanted as the primary key. Summery of @Ryk's answer. [link](http://sqlcruiser.blogspot.com/2010/05/why-fragmentation-occurs-and-how-to.html) – TamusJRoyce Jul 20 '11 at 17:24

3 Answers3

8

Actually its not as clear cut as Ryk suggests.

It can actually be faster to add data to a table with an index then in a heap.

Read this arctle - and as far as i am aware its quite well regarded:

http://www.sqlskills.com/blogs/kimberly/post/The-Clustered-Index-Debate-Continues.aspx

Bear in mind its written by SQL Server MVP and a Microsoft Regional Director.

Inserts are faster in a clustered table (but only in the "right" clustered table) than compared to a heap. The primary problem here is that lookups in the IAM/PFS to determine the insert location in a heap are slower than in a clustered table (where insert location is known, defined by the clustered key). Inserts are faster when inserted into a table where order is defined (CL) and where that order is ever-increasing. I have some simple numbers but I'm thinking about creating a much larger/complex scenario and publishing those. Simple/quick tests on a laptop are not always as "exciting".

Mike Dinescu
  • 54,171
  • 16
  • 118
  • 151
  • Daniel, that's an interesting post. It doesn't necessarily answer the question of why the insert was slower. RyK did that, but it does provide some helpful insights in showing that things aren't always purely black & white. – Mike Dinescu Sep 28 '12 at 15:58
  • [but only in the "right" clustered table] - They know how to phrase their words, because it is MUCH easier to NOT have the "right" clustered table, than it is to have the right one. Consider a table where you have a clustered index on a column you use 100% of the time, but it is a date (unique) column. Most people I know would chuck a clustered index on there. Now populate that table with say 20 million records and try to insert one. Now change the table to a heap and see the difference. What comes out of this, it is not black and white, and requires some understanding. – Ryk Oct 29 '12 at 21:53
  • OK, so it seems like we need to break this out into two answers. If your table has one primary key that is an auto-incrementing integer, then that is the fastest insert. If your table has primary keys that are not auto increasing ints, then you should just use a a heap instead and not have a primary key? – Robert Sep 04 '13 at 17:03
3

I think if you create a simple primary key that is clustered and made up of a single auto-incrementing column, then inserts into such a table might be faster. Most likely, a primary key made up of multiple columns may be the cause of slowdown in inserts. When you use a composite key for primary key, then rows inserted may not get added to the end of table but may need to be added somewhere in the middle of existing physical order of rows in table, which adds to the insert time and hence makes the INSERTS slower. So use a single auto-incrementing column as the the primary key value in your case to speed up inserts.

Sunil
  • 20,653
  • 28
  • 112
  • 197
-11

This is a good question, but a pretty crappy question too. Before you ask why an index slows down inserts, do you know what an index is?

If not, I suggest you read up on it. A clustered index is a B-tree, (Balanced tree), so every insert has to .... wait for it.... balance the tree. Hence clustered inserts are slower than inserting on heaps. If you don't know what a heap is, then I suggest stop using SQL Server until you understand basics. Else you are attempting to use a product of which you have no idea what you are doing, and basically driving a truck on the highway, blindfolded, thinking you are riding a bike. Unexpected results...

So when you create a clustered Index after a table is populated, your 'heap' has some statistics to use, and SQL can basically optimise a few things. This process is much more complicated than this, but in some cases you will find that creating a clustered index after the fact could be a lot slower than simply to insert to it. This has all to do with key types, number of columns, types of columns etc. This is unfortunately not a topic that is fit for an answer, this is more a whole course and few books by itself. Looking at your table above, it is a VERY simple table with ~7byte rows. In this instance a create-index after the insert will be faster, but chuck in a few varchar(250)'s etc, and the ballgame changes.

If you didn't know, a clustered index, (if your table has one), IS your table.

Hope this helps.

Ryk
  • 3,072
  • 5
  • 27
  • 32
  • 1
    Thanks Ryk - your explanation is excellent. I had a basic understanding of clustered indexes - I was just surprised by the 15x performance difference. I expected a performance hit when inserting with a PK, but not so severe as this was. It sounds like the small size of my table amplifies the relative performance overhead of inserting when a PK is active. – Eric Apr 01 '11 at 16:52
  • If you are going to down vote, at least explain your reasoning - here is also a good read http://stackoverflow.com/questions/4034076/reasons-not-to-have-a-clustered-index-in-sql-server-2005 – Ryk Nov 01 '12 at 01:19
  • 44
    Downvoting this answer because I feel that it is unnecessary to say things like "If you don't know what a heap is, then I suggest stop using SQL Server until you understand basics". That is just a bit too harsh. Everyone starts out a beginner. – avl_sweden Jun 03 '13 at 16:54
  • 1
    Writing data to a table without a clustered PK and then adding it after can be more expensive if the data is not written in order. When adding a clustered PK, the data has to be physically reorganised causing 3 times as much I/O (write, read, write of potentially every page) compared to writing the data once in order with a PK already in place. – Andy Bradbrook Jan 21 '16 at 10:18