4

I have a huge (millions of rows) table that contains the following columns:

[When] datetime2(0),
[What] tinyint,
[Who] bigint

It's basically a table of events ('What') that happened to various entities ('Who') at a time ('When'). There are about 10 different values for 'What' - it is an enumeration. There are about 10,000 values for 'Who' at the moment.

I want to be able to query this table, to ask whether something happened to a one or more entities between a certain time. For instance, did [What = 0, 1, 2, 3] happen to [Who = 0, 1, 2, 3] between [When = '2012-10-01' to '2012-11-01'].

I'm looking for advice on how best to index this table. I'm fairly sure a composite index is the way to go, but I'm unsure of the exact configuration. For instance, what should the column order be? I've read that the 'most selective' columns should go to the left, which I think in this case would be [When], [Who], [What]. Is this correct?

Sorry if this question seems vague, but I'd be grateful for any input. I'm using SQL Server 2012.

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
Barguast
  • 5,926
  • 9
  • 43
  • 73

1 Answers1

5

The myth about put the most selective column to the left is crap - sorry.

Your composite index will only be useful if you use the n left-most arguments, e.g. if you have the index on

(when, who, what)

then that index can answer question asking about

(when)

or about

(when, who)

or even about

(when, who, what)

but it cannot answer questions about

(who, what)

(since the left-most column isn't being used here).

That should be the point you consider - order the columns in such a way that you can answer the most of your questions with such a composite index.

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
  • Thanks, that makes a lot of sense. From a query point-of-view, it seems the column order doesn't make much difference in my scenario. What about fragmentation? Typically, new rows will always have a higher [When], but obviously the [Who] and [What] are not going to be as predictable. Will having [When] as the first column decrease the fragmentation, or - again - is there no real difference? Thanks again. – Barguast Nov 26 '12 at 16:17
  • 1
    In this example, where you search by all three columns, I don't agree that putting the most selective column on the left is crap. – Eduardo Nov 26 '12 at 16:18
  • 1
    @Barguast: having a (hopefully/potentially) steadily increasing first column can be beneficial, since it will need to fewer "out of order" inserts and thus to fewer page splits. – marc_s Nov 26 '12 at 16:19
  • 1
    @Eduardo: see [Myth: Most selective column at first position](http://use-the-index-luke.com/sql/myth-directory/most-selective-first) and [Index columns, selectivity and equality predicates](http://sqlinthewild.co.za/index.php/2009/01/19/index-columns-selectivity-and-equality-predicates/) - it's a documented (and awfully persistant) **myth** - but there's no truth to it. – marc_s Nov 26 '12 at 16:22
  • 1
    @marc_s: you are right, and I learnt something very useful; thanks +1. – Eduardo Nov 26 '12 at 16:24