2

I need help understanding how to create proper indexing on a table for fast range selects.

I have a table with the following columns:

Column --- Type


frameidx --- int

u --- int

v --- int

x --- float(53)

y --- float(53)

z --- float(53)


None of these columns is unique.

There are to be approximately 30 million records in this table.

An average query would look something like this:

   Select x, y, z from tablename
   Where
       frameidx = 4 AND
       u between 34 AND 500
       v between 0 AND 200

Pretty straight forward, no joins, no nested stuff. Just good ol' subset selection.

What sort of indexing should I do in MS SQL Server (2012) for this table in order to be able to fetch records (which can be in the thousands from this query) in (ideally) less than a 100ms, for example?

Thanks.

BKaun
  • 517
  • 7
  • 17
  • Assuming you are always filtering on `frameidx`, `u` and `v` I think a composite index on the 3 columns, with `x`, `y` and `z` as non key columns would be the most versatile: `CREATE NONCLUSTERED INDEX IX_tablename__frameidx_u_v ON tablename INCLUDE (x, y, z);` - However, I think expecting to optimise a query for all ranges to return thousands of rows from a table with 30m rows in less than 100ms is a little optimistic. – GarethD Sep 01 '14 at 08:26

1 Answers1

3

If you don't have indices, SQL Server needs to scan the whole table to find the required data. For such a big table (30M rows), that's time consuming.

If you have indices appropriate for your query, the SQL server will seek them (i.e. it will quickly find the required rows in the index, using the index structure). The index consists of the indexed column values, in the given index order, and pointers to the rows in the indexed table, so once the data is found in the index, the necessary data from the indexed table is recovered using those pointers.

SO, if you want to speed up thing, you need to create indexes for the columns which you're going to use to filter the ranges.

Adding indexes will improve the query response time, but will also take up more space, and make the insertions slower. So you shouldn't create a lot of indexes.

If you're going to use all the columns for filtering all the time, you should make only one index. And, ideally, that index should be the more selective, i.e. the one that has the most different values (the least number of repeated values). Only one index can be used for each query.

If you're going to use different sets of range filters, you should create more indexes.

Using a composite can be good or bad. In a composite key, the rows are ordered by all of the columns in the index. So, provided you index by A, B, C & D, filtering or ordering by A will give consecutive rows of the index, and it's a quick operation. And filtering by A, B, C & D, is ideal for this index. However, filtering or ordering only by D, is the worst case for this index, because it will need to recover data spread all over the index: remember that the data is ordered by A, then B, then C, then D, so the D info is spread all over the index. Depending on several factors (table stats, index selectivity, and so on), it's even possible that no index is used at all, and the table is scanned.

A final note on the clustered index: a clustered index defines the physical order in which the data is stored in the table. It doesn't need to be unique. If you're using one of the columns for filtering most of the times, it's a good idea to make that the table's clustered index, because, in this case, instead of seeking an index and finding the data in the indexed table using pointers, the table is sought directly, and that can improve performance.

So there is no simple answer, but I hope to know you have info to improve your query speed.

EDIT

Corrected info, according to a very interesting comment.

Arsen Khachaturyan
  • 7,904
  • 4
  • 42
  • 42
JotaBe
  • 38,030
  • 8
  • 98
  • 117
  • Forgive me for being thick, I'm still learning. So I should add a auto-generated primary key column even if that column is never part of the query? Also, it should be noted that space and insertion times are not an issue at all, as the database will be prepared 'offline' and then only select-queried 'online'. So with that in mind, should I create a single composite index on columns frameidx, u, and v? – BKaun Sep 01 '14 at 08:07
  • 1
    *"First you should ad a Primary key for the table, so that it can be properly indexed*" - This is not true at all! Firstly I assume you mean clustered index rather than primary key, secondly [heap tables](http://msdn.microsoft.com/en-us/library/hh213609.aspx) will use an efficient row pointer in the absence of a clustered index and are commonly used when data is always accessed through nonclustered indexes, so a clustered index is **not** required to index properly. – GarethD Sep 01 '14 at 08:10
  • @GarethD Wow, it's great what you pointed out! I didn't know about the RID. And I had inconsciously assumed that the clustered index should be unique (I didn't want to complicate the answer, so I sppke about PK). Thanks to your comment now I see the clustered index can be non-unique, and besides is possible to use the RID of a heap for indexing without need for a unique PK. – JotaBe Sep 01 '14 at 08:50
  • +1 for the edit, indexing questions this vague are almost impossible to answer, the best that can usually be achieved is explaining how the index works, and hoping that this enough to get the OP started, which is exactly what you have done. – GarethD Sep 01 '14 at 08:56
  • 1
    Thank you both for the help. I learned a lot about cluster/non-cluster indexes, and eventually through trial and error with the execution plan analyzer, learned that I needed a non-cluster index on the frameidx,u and v columns, with the column includes on the remaining x, y and z columns (just as GaraethD had provided up top in the comments) The server is now returning ~54'000 rows in about 101ms. Pretty good and just spot on for what I was looking for. Thanks so much guys. Really appreciate it. – BKaun Sep 02 '14 at 02:55
  • The statement "only one index can be used per query" is not correct. According to this reference "Queries use one index per joined table. When your SQL engine is planning the execution of a query, it can only lean on one index per joined table per query.". http://blog.celerity.com/how-to-design-sql-indexes – Action Dan Mar 07 '19 at 21:22