0

I would like to know if it is necessary to create an index for all fields within a table if one of your queries will use SELECT *.

To explain, if we had a table that 10M records and we did a SELECT * query on it would the query run faster if we have created an index for all fields within the table or does MySQL handle SELECT * in a different way to SELECT first_field, a_field, last_field.

To my understanding, if I had a query that did SELECT first_field, a_field FROM table then it would bring performance benefits if we created an index on first_field, a_field but if we use SELECT * is there even a benefit from creating an index for all fields?

Mureinik
  • 297,002
  • 52
  • 306
  • 350
Craig van Tonder
  • 7,497
  • 18
  • 64
  • 109
  • 2
    that shouldn't matter. It's the WHERE clause that is most likely to determine the index used – krock Aug 23 '15 at 07:12
  • @krock Thanks for your feedback, it leaves me even more confused though. I always figured that if we queried a field in mysql then an index should exist on that field too. Are you saying that the indexes are used for values in the WHERE clause and not the SELECT clause? :o – Craig van Tonder Aug 23 '15 at 07:15
  • Check out [this question](http://stackoverflow.com/questions/2955459/what-is-an-index-in-sql). Stated in the answer: "An index is used to speed up searching in the database". Indexes speed up searches - not retrieval of column values. – krock Aug 23 '15 at 07:21
  • @krock That is beginning to make sense now. So, i should actually be indexing the values within the WHERE clause and not the values that are being retrieved. Makes sense as that would speed up the calculation/seeking part. – Craig van Tonder Aug 23 '15 at 07:24

2 Answers2

3

Performing a SELECT * FROM mytable query would have to read all the data from the table. This could, theoretically, be done from an index if you have an index on all the columns, but it would be just faster for the database to read the table itself.

If you have a where clause, having an index on (some of) the columns you have conditions on may dramatically improve the query's performance. It's a gross simplification, but what basically happens is the following:

  1. The appropriate rows are filtered according to the where clause. It's much faster to search for these rows in an index (which is, essentially, a sorted tree) than a table (which is an unordered set of rows).
  2. For the columns that where in the index used in the previous step the values are returned.
  3. For the columns that aren't, the table is accessed (according to a pointer kept in the index).
Mureinik
  • 297,002
  • 52
  • 306
  • 350
  • That was a pro tip, thanks so much! So, its like trying to do a search and it is going 1, 2 skip a few, 43, 72, 100 but if you index the data it is sorted by these values so the required values will be say values 1-23 and there is nothing else after this to worry about? – Craig van Tonder Aug 23 '15 at 07:48
  • Just to clarify too, it seems that indexes would return the information slower than the database itsself, so it would not be good to include the select information and the where information within the index? For your point 3 however, if it was there at point 2 and it knew what values to return, why would it be faster to then go and query the database to return additional values relating to the query. Seems like a catch in the system? :) – Craig van Tonder Aug 23 '15 at 07:50
  • @IndigoIdentity I don't understand your question - indexes are a part of the database. – Mureinik Aug 23 '15 at 07:53
  • Put simply, you say that within the index the rows are sorted according to the where clause, that makes sense. But with that in mind, if they are already sorted in the best possible way, why not include the values from the SELECT clause so that at this point it is able to return this information without moving on to step 3? – Craig van Tonder Aug 23 '15 at 07:55
  • There's a trade-off here - the more columns you add to an index, the bigger it becomes, and slower to search through. If the additional column(s) aren't too large, it may indeed be a good idea (and even a common practice!) to add them to the index and save the table access in step 3. If it's a large list of large columns, the additional cost added to the filtering step (2) just isn't worth the gain. – Mureinik Aug 23 '15 at 07:59
  • Thanks a lot. Really helped me to understand this aspect of MySQL. One further question as it relates to my OP. As it would be theoretically possible to read all of the data from an index when running `SELECT * FROM mytable`, is it safe to say that theoretically it could in some cases be faster to read said data from the index itself? To my limited understanding, this is how things like TokuDB are able to leverage the read speed of the indexes and achieve performance benefits. – Craig van Tonder Aug 23 '15 at 12:40
  • @IndigoIdentity I'm not familiar with TokuDB, but generally speaking - yes. It would probably be more useful if you had a `where` clause, but yes. – Mureinik Aug 23 '15 at 13:43
3

indexing a mysql table for a column improves performance when there is a need to search or edit a row/record based on that column of that table.

for example, if there is an 'id' column and if it is a primary key; And in that case if you want to search a record using where clause on that 'id' column then you don't need to create index for the 'id' column because primary key column will act as an indexed column.

In another case, if there is an 'pid' column in the table and if it is not a primary key; Then in order to search based on 'pid' column then to improve performance it is better to create an index for the 'pid' column. That will make query fast to search the expected record.

Rashedul.Rubel
  • 3,446
  • 25
  • 36
  • Thank you for your input and feedback! Question, does that mean that the primary key will always be included in the index? So if i where to search based on the pid i would not need to include the id/key because this is already included? So the index would not be id, pid but simply pid? – Craig van Tonder Aug 23 '15 at 07:59
  • if 'id' column is primary key and use select * from table where id=1, then need not to create index for 'id' column. if use select * from table where pid=2 and it is not a primary key column then to make query faster you can create an index for 'pid' column – Rashedul.Rubel Aug 23 '15 at 08:46