1

If I have a table:

users

  • user_id, INT, PRIMARY KEY,
  • user_email_confirmed, TINYINT, (0 or 1),
  • user_email

and query:

 SELECT user_email FROM users WHERE user_id = 1 AND user_email_confirmed = 1

do I need to have an index on user_email_confirmed? Or maybe I need two-column-index on user_id, user_email_confirmed? If yes, why?

TheFrost
  • 1,265
  • 2
  • 15
  • 29
  • 1
    You don't have to have it, but it can be a good thing. Read about covering indexes here: http://stackoverflow.com/questions/609343/what-are-covering-indexes-and-covered-queries-in-sql-server The n-column index means the query can be run just from the index and therefore can be faster. – dcaswell Sep 08 '13 at 00:58
  • @user814064: It would have to be a three-column index to run this query from just the index. – Jonathan Leffler Sep 08 '13 at 00:59
  • That wasn't the point but I updated the comment -- thanks! – dcaswell Sep 08 '13 at 01:01

1 Answers1

1

If user_id is the primary key column and indexed, that is sufficient.

You should only consider making a composite index on user_id and user_email_confirmed if both the following conditions are satisfied:

  1. Your DBMS supports unique indexes with additional columns (DB2 does; I'm not sure which others do).

    CREATE UNIQUE INDEX pkx_user ON User(User_ID) INCLUDE(User_Email_Confirmed); -- DB2
    
  2. Many of your queries are going to include the user_email_confirmed column.

Otherwise, the index is wasted space. Given that you have to get the data for the row to retrieve the user_email value, there's no real cost to checking the user_email_confirmed column too. If your composite index included the user_email column too, you'd have a less efficient index (because the data for each row in the index would be larger), but you'd be able to avoid fetching the data row (it becomes an index-only query). Whether that matters is hard to judge — I doubt if the gain from not fetching the row is outweighed by the loss of efficiency in the index.

Jonathan Leffler
  • 730,956
  • 141
  • 904
  • 1,278
  • I see. Does the situation change drastically if there are more than one flag columns (for example, from `articles`, `... WHERE article_id = 1234 AND article_moderated = 1 AND article_deactivated = 0`)? – TheFrost Sep 08 '13 at 01:13
  • Not really; the more flags there are, the less likely that enough queries use all the columns to make it worthwhile. It does vary; it depends on your system and the queries you run. But generally, you should keep it simple — the DBMS is usually smart enough to work things out. Another issue to consider is whether the primary key index will use the augmented index or whether it will stick with an index on just the primary key column. It won't use the augmented index, then you have extra space wasted storing the primary key and augmented indexes — and that slows down update operations. – Jonathan Leffler Sep 08 '13 at 01:27
  • May I ask, what does "the primary key index will use the augmented index" mean? Especially, what "augmented index" is? – TheFrost Sep 08 '13 at 01:45
  • See the example in the updated question; that is what I meant by an augmented index. It means you still have a unique index on the `user_id`, but the index data includes extra information (the `user_email_confirmed` column in the example), making it an augmented index. Most DBMS use a unique index to enforce the primary key uniqueness constraint. Will the DBMS use the augmented index to enforce the primary key constraint, or will there in fact be two indexes created (and maintained, and analyzed for use), one on just the `user_id` and the other on `user_id` and `user_email_confirmed`? – Jonathan Leffler Sep 08 '13 at 01:50
  • For anybody interested - MySQL also allows to have augmented primary key. – TheFrost Sep 08 '13 at 02:06