How to decide which fields must be indexed in a database table

Question

Explanation

I have a table which does not have a primary key (or not even a composite key).

The table is for storing the time slots (opening hours and food delivery available hours) of the food shops. Let's call the table "business_hours" and the main fields are as below.

shop_id
day (0 - 6, means Sunday - Saturday)
type (open, delivery)
start_time
end_time

As an example, if shop A is opened on Monday from 9.00am - 01.00pm and 05.00pm to 10.00pm, there will be two records in business_hours table for this scenario.

-----------------------------------------------
| shop_id | day | type | start_time | end_time
-----------------------------------------------
| 1000    | 1   | open | 09:00:00   | 13:00:00
-----------------------------------------------
| 1000    | 1   | open | 17:00:00   | 22:00:00
-----------------------------------------------

When I query this table, I will use shop_id always as the first condition in where clause.

Ex:

SELECT COUNT(*) FROM business_hours WHERE shop_id = 1000 AND day = 1 AND type = 'open' AND start_time <= '13.29.00' AND end_time > '13.29.00';

Question

Applying index for "shop_id" is enough or "day" & "type" fields also should be indexed?

Also better if you can explain how the indexing really works.

Possible duplicate of [How does database indexing work?](http://stackoverflow.com/questions/1108/how-does-database-indexing-work) — Ralph Melhem, May 12 '16 at 05:34
Look at your queries and [_my indexing cookbook_](http://mysql.rjweb.org/doc.php/index_cookbook_mysql) to construct the optimal indexes. — Rick James, May 24 '16 at 00:33

score 2 · Accepted Answer · edited May 23 '17 at 12:31

It depends on several factors that you should specify:

How fast will the data grow
What is the estimated table size in rows
What queries will be run against that table
How fast do you expect the queries to run

It is more about thinking like: Some service will make thousands of inserts of new records per hour, the old records will be archived nightly and reports are to be created nightly from that table. In such a case you may prefer to not to create many indexes since they slow down inserts.

On the other hand if your table will grow and change slowly and many users will run queries against it, you need to have proper indexes to speed up queries.

If you can, try to create clustered unique primary key that most queries can benefit from. If you have data that form some timeline and most queries will get ranges of data using the datetime criteria (like from - to), it is better to include datetime in clustered index - you will get fastest query performance.

So something like this will grant you best performance for the mentioned select. (But you cannot store duplicate business hours for one shop and type)

CREATE TABLE Business_hours
( shop_id INT NOT NULL 
, day INT NOT NULL
--- other columns
, CONSTRAINT Business_hours_PK
    PRIMARY KEY (shop_id, day, type, start_time, end_time)     -- your clustered index
)

Just creating an index on fields used in the SELECT (all of them or just some of them most used), will speed up your query too:

CREATE INDEX BusinessHours_IX ON business_hours (shop_id,day,type, start_time, end_time);

Difference between clustered and non-clustered is that clustered index affects order in which are db records stored on disk.

You can use EXPLAIN to find missing indexes in your database, see this answer.

For more detail this blog.

score 1 · Answer 2 · answered May 12 '16 at 05:59

Yes, You are create a clustered index on this column (shop_id,day,type). I have create a index like above:

Create clustered index Ix on business_hours (shop_id,day,type)

Use this index your select query like above:

SELECT COUNT(*) FROM business_hours with (index (Ix)) WHERE shop_id = 1000 AND day = 1 AND type = 'open' AND start_time <= '13.29.00' AND end_time > '13.29.00';

You are get result fast but a table which have a primary key than not create clustered index and create a non clustered index

Did you see [tag:mysql] tag before answering? – N.B. May 12 '16 at 07:18 — N.B., May 12 '16 at 07:18

score 1 · Answer 3 · answered May 12 '16 at 06:28

It depends on your usability if you are not updating the record then use clustered index on

CREATE CLUSTERED INDEX Saleperday ON business_hours (shop_id,day,type);

because Clustered index traverse along the B Tree and stores the entire row on node itself, So searching is fast. But Updating records is memory cost effective as it shifts the entire row from memory crating new entry for same record.

OR ELSE

If Your are updating the records then non clustered index.

If you create ware house then use Column Store Indexes

For better understanding your can go to these links

http://www.programmerinterview.com/index.php/database-sql/clustered-vs-non-clustered-index/

http://www.patrickkeisler.com/2014/04/what-is-non-clustered-columnstore-index.html

http://searchsqlserver.techtarget.com/feature/SQL-Server-2014-columnstore-index-the-good-the-bad-and-the-clustered

Please reply for answer.

score 1 · Answer 4 · answered May 12 '16 at 07:02

Having decided against a primary key means the following would be allowed:

| shop_id | day | type   | start_time | end_time
+---------+-----+--------+------------+---------
| 1000    | 1   | open   | 09:00:00   | 13:00:00
| 1000    | 1   | open   | 09:00:00   | 13:00:00
| 1000    | 1   | open   | 17:00:00   | 22:00:00
| 1000    | 1   | closed | 17:00:00   | 22:00:00

So you can have duplicate entries that may lead to strange query results and even have a shop open and closed in the very same time range. (But well, we all know that even with a primary key you'd still need a before-insert trigger to detect a range overlapping, e.g. 12:00-15:00 vs. 13:00-16:00, and throw an error in case. - How I wish there were some built-in range detection, so we could, say, have a unique index on (shop_id, day, range(start_time, end_time)).)

As to your question: Provided your database is built well, you already have a foreign key on shop_id. You don't need any further index as long as you consider your queries fast enough.

Once you think you need to speed them up, you can add composite indexes as needed. That would usually be an index on all columns in the slow query's WHERE clause. If that still doesn't suffice add the columns that are in the GROUP BY clause, if any. Next step would be to add the columns of the HAVING clause, if any. Next would be the columns of the ORDER BY clause. And the last step would be to even add all columns in your SELECT clause, which would give you a covering index, i.e. all data needed for the query would be in the index and the table itself would hence not have to be accessed any longer.

But as mentioned: As long as you don't have performance issues, you don't have to add any composite indexes.

score 1 · Answer 5 · edited Apr 27 '20 at 11:35

To decide which fields must be indexed in a database table you need to observe the behavior of each query sent to the table. Indexes are the means of providing an efficient access path between the application and the data. The index provides the access path; so, when query asks for data to the database, it will know where to go to retrieve the data.

Here is some official Microsoft documentation

Clustered Indexes A clustered index stores the actual table data pages at the leaf level, and the table data is ordered physically around the key. A table can have only one clustered index, and when this index is created, the following events also occur: • Table data is rearranged. • New index pages are created. • All nonclustered indexes within the database are rebuilt. As a result, there are many disk I/O operations and extensive use of system and memory resources. If you plan to create a clustered index, be sure you have free space equal to at least 1.5 times the amount of data in the table. The extra free space ensures that you have enough space to complete the operation efficiently.

Nonclustered Indexes In a nonclustered index, pages at the leaf level contain a bookmark that tells SQL Server where to find the data row corresponding to the key in the index. If the table has a clustered index, the bookmark indicates the clustered index key. If the table does not have a clustered index, the bookmark is an actual row locator. When you create a nonclustered index, SQL Server creates the required index pages but does not rearrange table data.

The Indexing Method recommended by professionals is comprised of three phases: Monitor, Analyze, and then implements the index. That means you need to observe the behavior of your database when you run a query then work for get the best performance

SQL server use this operation for fetch the data:

Table scan: Reads the entire heap and, most likely, passes all the data to a secondary filter operation

Index scan: Reads the entire leaf level (every row) of the clustered index or non-clustered index. The index scan operation might filter the rows and return only those rows that meet the criteria, or it might pass all the rows to another filter operation depending on the complexity of the criteria. The data may or may not be ordered.

Index seek: Locates specific row(s) data using the index and returns only the selected rows in an ordered list

So, once you know that you can run the query and use the option Display the Estimated Execution Plan and analyses the performance, I recommend reading this post SQL SERVER – Index Seek Vs. Index Scan and Optimizing Your Query Plans with the SQL

How to decide which fields must be indexed in a database table

Explanation

Question

5 Answers5