1

I'm redesigning a database for a project, since the current version does not match the reality of it's usage anymore and is causing serious problems.

Several of the main tables consist of this structure:

  • Object Identifier (FK, 4 byte int)
  • Datetime of record (can be duplicate for different objects)
  • Value (float)

From other questions (i.e. Composite Clustered Index in SQL Server) i gather that clustered indexes shouldn't be used on composite keys, and preferably smaller datatypes.

The data in this table (several millions records and growing by 10k a day) only ever gets inserted or queried based on the datetime field (get min/max value or all values for object in date range), updates and deletes almost never happen and only outside hours where querying happens. The order of inserts is usually with increasing datetimes, but slight differences are observed and sometimes larger backlogs need to be inserted. The order of a potential ID field would correctly match the datetime growth of a single object, but won't in general for all records.

My question is: I see several different ways to do Keys/Indexes on this kind of table, but what approach for the keys would work best in these cases?

Community
  • 1
  • 1
Kevin Sijbers
  • 814
  • 7
  • 19
  • I believe the best option would be as you described, since the data is queried by the datetime field. You could put a clustered **non-unique** index on this datetime field (since your data in this column isn't unique). This way SQL Server internally makes it unique by adding a uniqueidentifier next to duplicate datetime records (which you don't see). It probably is an overhead to consider in space taken on the hard drive, but on the other hand your queries should fly. – MK_ Mar 09 '17 at 10:09
  • You might consider moving this to http://dba.stackexchange.com – SqlZim Mar 09 '17 at 14:01

1 Answers1

1

In your case it will not work to create a one-column clustered index on your DATETIME column. Values won't be unique...

To avoid fragmentation, it is very important, that the clustered index is implicitly sorted. Easiest is an IDENTITY column...

When you create a clustered index the clustered index is the table. It is - literally - the physically stored amount of data. This index includes all other columns implicitly. When you create another index (non-clustered of course), the existing clustered index will serve as lookup table.

With several million records you stumble into I/O issues. Many pages must be loaded to get all values together. Starting with SQL Server 2012 MS introduced column store indexes, which combine the advantages of a relational DBMS with the speed of column oriented storage systems. This index is stored in one block which can speed up queries against mass data enormously! The price you pay is disc space and slower manipulations.

You might use a 2-cloumn clustered index where you combine the date (in the first place) with an IDENTITY column. This will sort your table physically in the date's order. This should work pretty fast too.

What is best for you depends on many details:

Shnugo
  • 66,100
  • 9
  • 53
  • 114
  • Thank you for the answer. The link also seems interesting, but will profile the situation with just applying the suggested index first. – Kevin Sijbers Mar 09 '17 at 14:16
  • @Shnugo I have the same requirement. How would I go about creating the clustered index on date and identity? do I alter the primary key or just create a new index? – rethabile Apr 03 '18 at 10:01
  • @legen---waitforit---dary Please open a new question and provide some more details. You might place a comment here with a link to the new question. – Shnugo Apr 03 '18 at 11:15
  • Hi @Shnugo Thanks. I opened a new question and think I got what I was looking for. – rethabile Apr 03 '18 at 12:17
  • would it be not heavy to use a GUID with Date as clustered index? – sshanzel Jan 29 '20 at 13:38