20

In a former company I worked at, the rule of thumb was that a table should have no more than one index (allowing the odd exception, and certain parent-tables holding references to nearly all other tables and thus are updated very frequently).

The idea being that often, indexes cost the same or more to uphold than they gain. Note that this question is different to indexed-view-vs-indexes-on-table as the motivation is not only reporting.

Is this true? Is this index-purism worth it?

In your career do you generally avoid using indexes?

What are the general large-scale recommendations regarding indexes?

Currently and at the last company we use SQL Server, so any product specific guidelines are welcome too.

Community
  • 1
  • 1
Carlo V. Dango
  • 13,322
  • 16
  • 71
  • 114
  • 5
    What do you mean by *dangerous*? I don't think that any human lives are at stake no matter how many indexes you put on a table. – Klaus Byskov Pedersen Dec 07 '10 at 14:17
  • 2
    That sounds like "a little knowledge is a dangerous thing": a long time ago, the CEO of my employer had stumbled on an article about problems with SourceSafe. Next week, any sort of version control was banned from On High, as "it's dangerous". Hilarity ensued. – Piskvor left the building Dec 07 '10 at 14:36
  • 1
    @marc_s: You'd think? I don't know about that, I've resorted to running an illicit SVN server on my dev box and keeping track there. `file.php`,`file1.php`,`file1_.php`,`file_20071101.php` and/or `file_1piskvor.php` was the official "versioning" scheme (i.e. just give it a name different from the real thing; backups? we don't need no steenking backups!). Unexpected result: "The CEO went 'optimizing' again and now the app is broken. Do you have some kind of, y'know, previous version for those files, wink wink, nudge nudge?" (We managed to get a real SVN server eventually, after several years). – Piskvor left the building Dec 07 '10 at 16:12

9 Answers9

51

You need to create exactly as many indexes as you need to create. No more, no less. It is as simple as that.

Everybody "knows" that an index will slow down DML statements on a table. But for some reason very few people actually bother to test just how "slow" it becomes in their context. Sometimes I get the impression that people think that adding another index will add several seconds to each inserted row, making it a game changing business tradeoff that some fictive hotshot user should decide in a board room.

I'd like to share an example that I just created on my 2 year old pc, using a standard MySQL installation. I know you tagged the question SQL Server, but the example should be easily converted. I insert 1,000,000 rows into three tables. One table without indexes, one table with one index and one table with nine indexes.

drop table numbers;
drop table one_million_rows;
drop table one_million_one_index;
drop table one_million_nine_index;

/*
|| Create a dummy table to assist in generating rows
*/
create table numbers(n int);

insert into numbers(n) values(0),(1),(2),(3),(4),(5),(6),(7),(8),(9);

/*
|| Create a table consisting of 1,000,000 consecutive integers
*/   
create table one_million_rows as
    select d1.n + (d2.n * 10)
                + (d3.n * 100)
                + (d4.n * 1000)
                + (d5.n * 10000)
                + (d6.n * 100000) as n
      from numbers d1
          ,numbers d2
          ,numbers d3
          ,numbers d4
          ,numbers d5
          ,numbers d6;


/*
|| Create an empty table with 9 integer columns.
|| One column will be indexed
*/
create table one_million_one_index(
   c1 int, c2 int, c3 int
  ,c4 int, c5 int, c6 int
  ,c7 int, c8 int, c9 int
  ,index(c1)
);

/*
|| Create an empty table with 9 integer columns.
|| All nine columns will be indexed
*/
create table one_million_nine_index(
   c1 int, c2 int, c3 int
  ,c4 int, c5 int, c6 int
  ,c7 int, c8 int, c9 int
  ,index(c1), index(c2), index(c3)
  ,index(c4), index(c5), index(c6)
  ,index(c7), index(c8), index(c9)
);


/*
|| Insert 1,000,000 rows in the table with one index
*/
insert into one_million_one_index(c1,c2,c3,c4,c5,c6,c7,c8,c9)
select n, n, n, n, n, n, n, n, n
  from one_million_rows;

/*
|| Insert 1,000,000 rows in the table with nine indexes
*/
insert into one_million_nine_index(c1,c2,c3,c4,c5,c6,c7,c8,c9)
select n, n, n, n, n, n, n, n, n
  from one_million_rows;

My timings are:

  • 1m rows into table without indexes: 0,45 seconds
  • 1m rows into table with 1 index: 1,5 seconds
  • 1m rows into table with 9 indexes: 6,98 seconds

I'm better with SQL than statistics and math, but I'd like to think that: Adding 8 indexes to my table, added (6,98-1,5) 5,48 seconds in total. Each index would then have contributed 0,685 seconds (5,48 / 8) for all 1,000,000 rows. That would mean that the added overhead per row per index would have been 0,000000685 seconds. SOMEBODY CALL THE BOARD OF DIRECTORS!

In conclusion, I'd like to say that the above test case doesn't prove a shit. It just shows that tonight, I was able to insert 1,000,000 consecutive integers into in a table in a single user environment. Your results will be different.

Ronnis
  • 12,593
  • 2
  • 32
  • 52
  • Wow! Always wanted an answer that would quantify the *perf-hit* a new index adds. Thanks for this Ronnis! – divyanshm Oct 08 '15 at 05:43
  • Would the relative overhead have changed in case of much larger tables ? – vasilakisfil Feb 05 '16 at 10:13
  • This prove nothing as index overhead performance commes mainly from concurency, Update/Delete in table an index maintenance (fragmentation, ...) – Pit Ming Jun 30 '16 at 12:06
  • 1
    @PitMing, you are right. Call the board of directors! – Ronnis Jul 01 '16 at 06:04
  • @Ronnis Beautiful explanation, but it would be better if you would have showed comparison for read operation, than the point that is asked in the question can be correctly justified. – Ravi Feb 06 '19 at 07:00
10

That is utterly ridiculous. First, you need multiple indexes in order to perfom correctly. For instance, if you have a primary key, you automatically have an index. that means you can't index anything else with the rule you described. So if you don't index foreign keys, joins will be slow and if you don't index fields used in the where clause, queries will still be slow. Yes you can have too many indexes as they do take extra time to insert and update and delete records, but no more than one is not dangerous, it is a requirement to have a system that performs well. And I have found that users tolerate a longer time to insert better than they tolerate a longer time to query.

Now the exception might be for a system that takes thousands of readings per second from some automated equipment. This is a database that generally doesn't have indexes to speed inserts. But usually these types of databases are also not used for reading, the data is transferred instead daily to a reporting database which is indexed.

HLGEM
  • 94,695
  • 15
  • 113
  • 186
8

Yes, definitely - too many indexes on a table can be worse than no indexes at all. However, I don't think there's any good in having the "at most one index per table" rule.

For SQL Server, my rule is:

  • index any foreign key fields - this helps JOINs and is beneficial to other queries, too
  • index any other fields when it makes sense, e.g. when lots of intensive queries can benefit from it

Finding the right mix of indices - weighing the pros of speeding up queries vs. the cons of additional overhead on INSERT, UPDATE, DELETE - is not an exact science - it's more about know-how, experience, measuring, measuring, and measuring again.

Any fixed rule is bound to be more contraproductive than anything else.....

The best content on indexing comes from Kimberly Tripp - the Queen of Indexing - see her blog posts here.

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
3

Unless you like very slow reads, you should have indexes. Don't go overboard, but don't be afraid of being liberal about them either. EVERY FK should be indexed. You're going to do a look up each of these columns on inserts to other tables to make sure the references are set. The index helps. As well as the fact that indexed columns are used often in joins and selects.

We have some tables that are inserted into rarely, with millions of records. Some of these tables are also quite wide. It's not uncommon for these tables to have 15+ indexes. Other tables with heavy inserting and low reads we might only have a handful of indexes- but one index per table is crazy.

Mike M.
  • 12,343
  • 1
  • 24
  • 28
  • You may want to answer this: http://stackoverflow.com/questions/4377986/how-to-index-all-foreign-keys-in-ms-sql – JeffO Dec 07 '10 at 15:32
2

Updating an index is once per insert (per index). Speed gain is for every select. So if you update infrequently and read often, then the extra work may be well worth it.

If you do different selects (meaning the columns you filter on are different), then maintaining an index for each type of query is very useful. Provided you have a limited set of columns that you query often.

But the usual advice holds: if you want to know which is fastest: profile!

Hans Kesting
  • 38,117
  • 9
  • 79
  • 111
  • 1
    Its worse than that. Eg. updating a table (the many-part of a one-to-many relation) using a non-indexed column results in a table-lock, and reading similarly is a table scan. This can lead to a serious dose of locking badly hurting concurrency. – Carlo V. Dango Dec 07 '10 at 14:19
2

You should of course be careful not to create too many indexes per table, but only ever using a single index per table is not a useful level.

How many indexes to use depends on how the table is used. A table that is updated often would generally have less indexes than one that is read much more often than it's updated.

We have some tables that are updated regularly by a job every two minutes, but they are read often by queries that vary a lot, so they have several indexes. One table for example have 24 indexes.

Guffa
  • 687,336
  • 108
  • 737
  • 1,005
1

So much depends on your schema and the queries that you normally run. For example: if you normally need to select above 60% of the rows of your table, indexes won't help you and it will be cheaper to table scan than to index scan and then lookup rows. Focused queries that select a small number of rows in different parts of the table or which are used for joins in queries will probably benefit from indexes. The right index in the right place can make or break a feature.

Indexes take space so making too many indexes on a table can be counter productive for the same reasons listed above. Scanning 5 indexes and then performing row lookups may be much more expensive than simply table scanning.

Good design is the synthesis about about knowing when to normalise and when not to. If you frequently join on a particular column, check the IO plan with the index and without. As a general rule I avoid tables with more than 20 columns. This is often a sign that the data should be normalised. More than about 5 indexes on a table and you may be using more space for the indexes than the main table, be sure that is worth it. These rules are only the lightest of guidance and so much depends on how the data will be used in queries and what your data update profile looks like.

Experiment with your query plans to see how your solution improves or degrades with an index.

Sir Wobin
  • 1,080
  • 1
  • 8
  • 12
0

Optimizing the retrieval with indexes must be carefully designed to reflect actual query patterns. Surely, for a table with Primary Key, you will have at least one clustered index (that's how data is actually stored), then any additional indexes are taking advantage of the layout of the data (clustered index).
After analyzing queries that execute against the table, you want to design an index(s) that cover them. That may mean building one or more indexes but that heavily depends on the queries themselves. That decision cannot be made just by looking at column statistics only.
For tables where it's mostly inserts, i.e. ETL tables or something, then you should not create Primary Keys, or actually drop indexes and re-create them if data changes too quickly or drop/recreated entirely. I personally would be scared to step into an environment that has a hard-coded rule of indexes per table ratio.

Roman
  • 1,177
  • 1
  • 17
  • 25
0

Every table must have a PK, which is indexed of course (generally a clustered one), then every FK should be indexed as well.
Finally you may want to index fields on which you often sort on, if their data is well differenciated: for a field with only 5 possible values in a table with 1 million records, an index will not be of a great benefit.
I tend to be minimalistic with indexes, until the db starts beeing well filled, and ...slower. It is easy to identify the bottlenecks and add just the right the indexes at that point.

iDevlop
  • 24,841
  • 11
  • 90
  • 149