0

We have a transaction table of over 111m rows that has a clustered composite primary key of...

RevenueCentreID  int
DateOfSale       smalldatetime
SaleItemID       int
SaleTypeID       int

...in a SQL 2008 R2 database.

We are going to be truncating and refilling the table soon for an archiving project, so the opportunity to get the indexes right will be once the table has been truncated. Would it be better to keep the composite primary key or should we move to a unique auto increment primary key?

Most searches on the table are done using the DateOfSale and RevenueCentreID columns. We also often join to the SaleItemID column. We hardly ever use the SaleType column, in fact it is only included in the primary key for uniqueness. We dont care about how long it takes to insert & delete new sales figures(done over night) but rather the speed of returning reports.

Marcelo Cantos
  • 181,030
  • 38
  • 327
  • 365
MarcoF
  • 205
  • 2
  • 12
  • http://stackoverflow.com/questions/6493792/should-sql-server-transaction-tables-always-have-a-surrogate-primary/6494394#6494394 – HLGEM Jun 27 '11 at 14:25

3 Answers3

1

In have learned you want and need both a natural key and a surrogate key.

The natural key keeps the business keys unique and is prefect for indexing. where the surrogate key will help with queries and development.

So in your case a surrogate auto incrementing key is good in the fact it will help keep all the rows of data in tact. And a natural key of DateOfSale, RevenueID and maybe ClientID would make a great way of ensuring no duplicate records are being stored and speed up querying because you can index the natural key.

KenL
  • 865
  • 5
  • 14
  • "So in your case a surrogate auto incrementing key is good in the fact it will help keep all the rows of data in tact." What does that mean? That without a surrogate ID number, rows are in danger of flying apart? – Mike Sherrill 'Cat Recall' Jul 30 '11 at 11:59
1

A surrogate key serves no purpose here. I suggest a clustered primary key on the columns as listed, and an index on SaleItemID.

Marcelo Cantos
  • 181,030
  • 38
  • 327
  • 365
  • I agree that it is redundant in terms of enforcing uniqueness and referential integrity. However, having a single value to identify a unique row *can* make certain complex queries simpler. I'd say (and this is deliberately redundant) the surrogate key is only required if you have a query where it's required. – MatBailie Jun 27 '11 at 12:58
  • Even if you don't need it now - having a surrogate key makes it so much easier to use an ORM at some point in the future. – BonyT Jun 27 '11 at 18:04
  • @BonyT: ORMs are broken. See [here](http://stackoverflow.com/questions/760834/question-about-the-benefit-of-using-an-orm/2671551#2671551) for a detailed exposition of why I say this. – Marcelo Cantos Jun 28 '11 at 09:09
  • @Dems: It would only make queries simpler if some other relation had a foreign key on the surrogate. Since evidently nothing currently has a foreign on the existing candidate key, this wouldn't be the case. – Marcelo Cantos Jun 28 '11 at 09:11
  • @Marcelo - I disagree - quite strongly with your statement, but I suspect we share common sentiments. Having worked without an ORM, with home baked ORM's, and with NHibernate then I know which I prefer. However, bolting on an ORM to a schema design badly designed for it is just plain stupid. Like anything else in the software industry ORM is a tool, and any tool can be misused. – BonyT Jun 28 '11 at 09:17
  • @BonyT: I've worked with NHibernate and found it to be nothing but pain and frustration for anything but the most basic toy problems. The problems stem from the very root idea of treating facts as objects. Facts aren't objects, and every attempt I've seen to pretend otherwise has always led to hell and damnation. In any event, I've said more than my piece in the answer I linked to, above. You can also find a fairly comprehensive discussion of the problems with ORM [here](http://bit.ly/aPTAxO) (though solution 6 — my preference — contradicts the author's assertion that ORM is a necessity). – Marcelo Cantos Jun 28 '11 at 11:38
  • @Marcelo: I've read the linked article and posted most of my thoughts there. – BonyT Jun 28 '11 at 11:45
  • @Marcelo: There are many situation that I have encountered where an individual record is located via use of a correlated sub-query. Where-as Oracle can do `WHERE (a,b) = (SELECT a,b FROM blah)`, SQL Server does not. As such the existence of a surrogate allows `WHERE id = (SELECT id FROM blah)`. There are other examples. – MatBailie Jun 28 '11 at 23:05
  • @Dems: Most SQL engines — including SQL Server — can do this: `WHERE EXISTS (SELECT * FROM blah WHERE a = outer.a AND b = outer.b)`. – Marcelo Cantos Jun 29 '11 at 11:56
0

If you don't care about the speed of inserts and deletions, then you probably want multiple indexes which target the queries precisely.

You could create an auto increment primary key as you suggest, but also create indexes as required to cover the reporting queries. Create a unique constraint on the columns you currently have in the key to enforce uniqueness.

Index tuning wizard will help with defining the optimum set of indexes, but it's better to create your own.

Rule of thumb - you can define columns to index, and also "include" columns.

If your report has an OrderBy or a Where clause on a column then you need the index to be defined against these. Any other fields returned in the select should be included columns.

BonyT
  • 10,750
  • 5
  • 31
  • 52