45

I have a table in SQL Server database which I want to be able to search and retrieve data from as fast as possible. I don't care about how long time it takes to insert into the table, I am only interested in the speed at which I can get data.

The problem is the table is accessed with 20 or more different types of queries. This makes it a tedious task to add an index specially designed for each query. I'm considering instead simply adding an index that includes ALL columns of the table. It's not something you would normally do in "good" database design, so I'm assuming there is some good reason why I shouldn't do it.

Can anyone tell me why I shouldn't do this?

UPDATE: I forgot to mention, I also don't care about the size of my database. It's OK that it means my database size will grow larger than it needed to

Niels Brinch
  • 3,033
  • 9
  • 48
  • 75
  • 5
    *"Why can't I simply add an index that includes all columns?"* Wouldn't that be...the table? – T.J. Crowder Aug 29 '15 at 06:52
  • 2
    @T.J.Crowder yes but not exactly. – ypercubeᵀᴹ Jan 10 '17 at 13:44
  • 2
    @T.J.Crowder -- The order of the columns matters greatly. A simple example is in a many:many mapping table where you need to go both directions. See example [_here_](http://mysql.rjweb.org/doc.php/index_cookbook_mysql#many_to_many_mapping_table). – Rick James Jan 11 '17 at 00:28

9 Answers9

88

First of all, an index in SQL Server can only have at most 900 bytes in its index entry. That alone makes it impossible to have an index with all columns.

Most of all: such an index makes no sense at all. What are you trying to achieve??

Consider this: if you have an index on (LastName, FirstName, Street, City), that index will not be able to be used to speed up queries on

  • FirstName alone
  • City
  • Street

That index would be useful for searches on

  • (LastName), or
  • (LastName, FirstName), or
  • (LastName, FirstName, Street), or
  • (LastName, FirstName, Street, City)

but really nothing else - certainly not if you search for just Street or just City!

The order of the columns in your index makes quite a difference, and the query optimizer can't just use any column somewhere in the middle of an index for lookups.

Consider your phone book: it's order probably by LastName, FirstName, maybe Street. So does that indexing help you find all "Joe's" in your city? All people living on "Main Street" ?? No - you can lookup by LastName first - then you get more specific inside that set of data. Just having an index over everything doesn't help speed up searching for all columns at all.

If you want to be able to search by Street - you need to add a separate index on (Street) (and possibly another column or two that make sense).

If you want to be able to search by Occupation or whatever else - you need another specific index for that.

Just because your column exists in an index doesn't mean that'll speed up all searches for that column!

The main rule is: use as few indices as possible - too many indices can be even worse for a system than having no indices at all.... build your system, monitor its performance, and find those queries that cost the most - then optimize these, e.g. by adding indices.

Don't just blindly index every column just because you can - this is a guarantee for lousy system performance - any index also requires maintenance and upkeep, so the more indices you have, the more your INSERT, UPDATE and DELETE operations will suffer (get slower) since all those indices need to be updated.

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
  • 3
    Great answer, thanks. You mention the order of the index: Will the mentioned index work well both for "WHERE LastName = 'a' ORDER BY FirstName" and for "WHERE FirstName = 'a' ORDER BY LastName"? – Niels Brinch Mar 29 '11 at 04:40
  • 1
    @Niels Brinch: an index always works for the first n columns it contains; if it's on (LastName, FirstName), it will help for WHERE or ORDER BY on (LastName) or on (LastName,FirstName) - but **not** for (FirstName) alone (neither a `WHERE FirstName = ...` nor a `ORDER BY FirstName` will benefit from such an index). – marc_s Mar 29 '11 at 04:46
  • Does that mean "WHERE FirstName = 'a' ORDER BY LastName" would not benefit from the "LastName, FirstName" index? – Niels Brinch Mar 29 '11 at 05:18
  • @Niels Brinch: in that case, the `ORDER BY LastName` would benefit - the `WHERE FirstName = 'a'` would **NOT** benefit – marc_s Mar 29 '11 at 05:22
12

You are having a fundamental misunderstanding how indexes work.

Read this explanation "how multi-column indexes work".

The next question you might have is why not creating one index per column--but that's also a dead-end if you try to reach top select performance.

You might feel that it is a tedious task, but I would say it's a required task to index carefully. Sloppy indexing strikes back, as in this example.

Note: I am strongly convinced that proper indexing pays off and I know that many people are having the very same questions you have. That's why I'm writing a the a free book about it. The links above refer the pages that might help you to answer your question. However, you might also want to read it from the beginning.

Markus Winand
  • 8,371
  • 1
  • 35
  • 44
3

I think the questioner is asking

'why can't I make an index like':

create index index_name
on table_name
(
    *
)

The problems with that have been addressed.

But given it sounds like they are using MS sql server. It's useful to understand that you can include nonkey columns in an index so they the values of those columns are available for retrieval from the index, but not to be used as selection criteria :

create index index_name
on table_name
(
    foreign_key
)
include (a,b,c,d) -- every column except foreign key

I created two tables with a million identical rows

I indexed table A like this


create nonclustered index index_name_A
on A
(
    foreign_key -- this is a guid
)

and table B like this

create nonclustered index index_name_B
on B
(
    foreign_key -- this is a guid
)
include (id,a,b,c,d) -- ( every key except foreign key)

no surprise, table A was slightly faster to insert to.

but when I and ran these this queries

select * from A where foreign_key = @guid
select * from B where foreign_key = @guid

On table A, sql server didn't even use the index, it did a table scan, and complained about a missing index including id,a,b,c,d

On table B, the query was over 50 times faster with much less io

forcing the query on A to use the index didn't make it any faster

select * from A where foreign_key = @guid
select * from A with (index(index_name_A)) where foreign_key = @guid

compound eye
  • 1,898
  • 18
  • 23
  • It’s surprising that it would not even use an index because other columns were not included. Almost seems like an error. Thanks for sharing. – Niels Brinch Apr 30 '22 at 11:44
  • @NielsBrinch Not necessarily. I have done some testing on this, and sometimes a clustered index scan is considerably less costly than an index that does not actually have the data you need. I assume this is somewhat similar in concept to how SSDs, for example, are much faster at reading large blocks of contiguous data than they are at reading tons of small, randomly positioned bits. In SQL, if your index knows where your data lives but does not store it, execution will be doing a lot of random poking around to retrieve it. That is in addition to the cost of the seek itself. – Daniel Feb 17 '23 at 21:57
2

...if you add an index that contains all columns, and a query was actually able to use that index, it would scan it in the order of the primary key. Which means hitting nearly every record. Average search time would be O(n/2).. the same as hitting the actual database.

You need to read a bit lot about indexes.

It might help if you consider an index on a table to be a bit like a Dictionary in C#.

var nameIndex = new Dictionary<String, List<int>>();

That means that the name column is indexed, and will return a list of primary keys.

var nameOccupationIndex = new Dictionary<String, List<Dictionary<String, List<int>>>>();

That means that the name column + occupation columns are indexed. Now imagine the index contained 10 different columns, nested so far deep it contains every single row in your table.

This isn't exactly how it works mind you. But it should give you an idea of how indexes could work if implemented in C#. What you need to do is create indexes based on one or two keys that are queried on extensively, so that the index is more useful than scanning the entire table.

Josh Smeaton
  • 47,939
  • 24
  • 129
  • 164
  • Surely it would depend if the DBMS indexes the columns in separate indexes or not? – ewanm89 Mar 27 '11 at 08:22
  • I'd never try to analyze how much data will be hit during query, since OP did not provide us cardinality of data, did not describe their nature and did not show us the queries he perform over data. – zerkms Mar 27 '11 at 08:22
  • 1
    Indexes also tend to be an ordered data structure, the same can't be said for the unordered data in the columns. O(n) is fastest search on unordered data. – ewanm89 Mar 27 '11 at 08:23
  • And yes, for B-Tree indexes speed of lookup is `O(logn)` which is definitely better than `O(n/2)` – zerkms Mar 27 '11 at 08:28
  • Sorry, you're right. On a non-unique unordered field, search is O(n). On a unique unordered field, search would be O(n/2) average case. That is on a table scan, which is essentially what would happen if there was an index containing every column. It just would never be used. – Josh Smeaton Mar 27 '11 at 08:45
  • O() is worst case, not average, therefore, worst case in both is entry not in the list. which makes is O(n) regardless of uniqueness. – ewanm89 Mar 27 '11 at 09:04
  • Of course, it has been taught wrongly that it can be average case in some situations. O() is from the upper bound of asymptotic growth rates in Mathematics, which means worst case. – ewanm89 Mar 27 '11 at 09:07
  • The C# example is a really great way to explain indexes. – Niels Brinch Mar 03 '15 at 11:54
2

If this is a data warehouse type operation where queries are highly optimized for READ queries, and if you have 20 ways of dissecting the data, e.g.

WHERE clause involves..

 Q1: status, type, customer
 Q2: price, customer, band
 Q3: sale_month, band, type, status
 Q4: customer
 etc

And you absolutely have plenty of fast storage space to burn, then by all means create an index for EVERY single column, separately. So a 20-column table will have 20 indexes, one for each individual column. I could probably say to ignore bit columns or low cardinality columns, but since we're going so far, why bother (with that admonition). They will just sit there and churn the WRITE time, but if you don't care about that part of the picture, then we're all good.

Analyze your 20 queries, and if you have hot queries (the hottest ones) that still won't go any faster, plan it using SSMS (press Ctrl-L) with one query in the query window. It will tell you what index can help that queries - just create it; create them all, fully remembering that this adds again to the write cost, backup file size, db maintenance time etc.

RichardTheKiwi
  • 105,798
  • 26
  • 196
  • 262
  • Creating an index for each columns would only benefit the first parameter in each query. This is why I'm curious about creating an index that includes all columns rather than one index for each column. – Niels Brinch Mar 29 '11 at 05:32
1

I'm considering instead simply adding an index that includes ALL columns of the table.

This is always a bad idea. Indexes in database is not some sort of pixie dust that works magically. You have to analyze your queries and according to what and how is being queried - append indexes.

It is not as simple as "add everything to index and have a nap"

zerkms
  • 249,484
  • 69
  • 436
  • 539
  • Thanks, I suspected as much, which is why my question was WHY is this a bad idea. – Niels Brinch Mar 29 '11 at 04:23
  • It's a "bad idea" only in that it is likely to be a waste. Very likely to be a waste if the order of the columns is the same as in the table. Might not be a waste for queries that can make use of the first column(s) for filtering and/or ordering. – Rick James Jan 11 '17 at 00:34
1

I see only long and complicated answers here so I thought I should give the simplest answer possible.

You cannot add an entire table, or all its columns, to an index because that just duplicates the table.

In simple terms, an index is just another table with selected data ordered in the order you normally expect to query it in, and a pointer to the row on disk where the rest of the data lives.

So, a level of indirection exists. You have a partial copy of a table in an preordered manner (both on disk and in RAM, assuming the index is not fragmented), which is faster to query for the columns defined in the index only, while the rest of the columns can be fetched without having to scan the disk for them, because the index contains a reference to the correct position on disk where the rest of the data is for each row.

Paul-Sebastian Manole
  • 2,538
  • 1
  • 32
  • 33
0

1) size, an index essentially builds a copy of the data in that column some easily searchable structure, like a binary tree (I don't know SQL Server specifcs). 2) You mentioned speed, index structures are slower to add to.

ewanm89
  • 919
  • 5
  • 22
  • I commented my own question saying speed for insertion etc. is not important in my case. I apologize for not putting this in the original question. – Niels Brinch Mar 29 '11 at 05:33
-1

That index would just be identical to your table (possibly sorted in another order).
It won't speed up your queries.

Albin Sunnanbo
  • 46,430
  • 8
  • 69
  • 108
  • As if you said if data in indexes is sorted - why wouldn't it speed up select queries? ;-) – zerkms Mar 27 '11 at 08:23
  • Because it would require as much I/O as a table scan and wouldn't be sorted on the columns required for different `join` or `where`. – Albin Sunnanbo Mar 27 '11 at 08:51
  • we don't know what queries he performs. Even for queries for joins or whatever it can be possible to never access table data (because all needed data is in index). – zerkms Mar 27 '11 at 08:52
  • Not to mention the DBMS caching queries and stuff. – ewanm89 Mar 27 '11 at 09:08
  • I believe the correct answer here is, it WOULD speed up my query for one VERY specific query and for other queries the index would be ignored or unusable. – Niels Brinch Mar 29 '11 at 05:30
  • It will speed up certain queries on certain datasets with certain RDBMSs. I have an Oracle example where the costs dropped from > 10,0000 to < 500. – Michael Piefel Jun 10 '16 at 06:16