1

While working on a project to test the performance on a database, I came to the point of adding indexes. Having surfed a big part on the internet I'm still left with a couple of questions.

On what table/column is it a good idea to put an index?

I have different types of tables for examples a table full with predefined country names. So I believe it is a good idea to put an index on the Column country_name. I know this is good because there is a small chance I have to add new records to this table and queries will be faster when using a country_name in the where clause.

But what about more complex tables like client (or any other table that will chance a lot and contains a big amount of columns)?

What about combined indexes?

When are combined indexes a good idead, is it when I will query a lot of clients with their first_name and last_name together? Or is it better to add individual indexes to both those columns?

Paradox?

Having read this answer on stackoverflow, I'm left with a paradox. Knowing the data will increase significantly, is a reason for me to add an index. But will slow it down at the same time, as indexes slow down updates/inserts.

e.g. I have to keep a daily track of the weight of clients(>3M records). Adding an index will help me get my results faster. But I gain about 1000 new clients each day, so I'll have to insert them AND update their weights. Which means slower performance because of the inserts/updates.

mySQL specific addition

Is there an advantage on different storage engines, combined with indexes? As for now I've only used innoDB.

Community
  • 1
  • 1
Edito
  • 3,030
  • 13
  • 35
  • 67
  • I though the question is not badly written, I'm voting for 'too broad'. Tutorials, and probably even complete books have been written on the subject, and although there are some basic rules (which you should have found already), there is a lot of 'it depends' on this subject. – GolezTrol Oct 31 '15 at 21:58
  • The question isn't just whether your tables change. It's the ratio of read-only access (that benefit from the index) vs. modify (that might benefit some from an index, but also have to spend time updating it). The question doesn't seem to say anything about the amount of read-only queries. – Peter Cordes Oct 31 '15 at 22:12
  • most of time you will find yourself doing some tradeoff between two things. if your workset, say your table, is mostly read, indexing will speed up; otherwise most of the operations are changing your date indexes may slow down these operations. this most important principle is to understand your workset(data) and operations. – Muatik Oct 31 '15 at 22:14
  • @GolezTrol I agree, this question is too broad. – Shadow Oct 31 '15 at 22:57
  • @GolezTrol, lets see it like this. Is there a general formula to know when to use an index (if #selects / #inserts > 0.5)? I mean almost all tables will get updates and inserts, hence not using an index. Also adding an index on 1 column will affect the entire table. The way I see it now, an index will only be beneficial is you only use select statements on it, am I right? – Edito Oct 31 '15 at 23:36

2 Answers2

3

I'm going to focus on the "Combined Indexes" part of the question, but use that to cover several other points that I think will help you better understand indexes.

What about combined indexes?

When are combined indexes a good idead, is it when I will query a lot of clients with their first_name and last_name together? Or is it better to add individual indexes to both those columns?

Indexes are just like phone books. A phone book is a table with fields for Last_Name, First_Name, Address, and Phone_Number. This table has an index on Last_Name,First_Name. This is what you called a combined index.

Let's say you wanted to find "John Smith" in this phone book. That would work out to an query like this:

SELECT * FROM PhoneBook WHERE  First_Name = 'John' and Last_Name = 'Smith';

That's pretty easy in your phone book. Just find the section for "Smith", and then go find all the "John"s within that section.

Now imagine that instead of a combined index on Last_Name,First_Name, you had separate indexes: one for Last_Name and one for First_Name. You try to run the same query. So you open up the Last_Name index and find the section for Smith. There are a lot of them. You go to find the John's, but the First_Name fields aren't in the correct order. Maybe it's ordered by Address now instead. More likely in a database, it's in order by when this particular Mr or Ms Smith first moved to town. You'll have to go through all of the Smiths to find your phone number. That's not so good.

So we move to the First_Name index instead. You do the same process and find the section for "John". This isn't any better. We didn't specify to additionally order by last name, and so you have to go through all of the Johns to find your Smiths.

This is exactly how database indexes work. Each index is just a copy of the information included with index, stored in the order specified by the index, along with a pointer back to the full record. There are some additional optimizations, like not filling up each page the index so that you can more efficiently add new entries without having to rebuild the whole index (you only need to rebuild that page), but in a nutshell each new index is another phone book that you have to maintain. Hopefully you can see now why things COLUMN LIKE '%keyword%' searches are so bad.

The other thing to understand about indexes is they exist to support queries, not tables. You don't necessarily want to look at a table and think about what columns you'll key on. You want to look at your queries and think about what columns they use for each table.

For this reason, you may still need separate indexes for both First_Name and Last_Name. This would be when you need to support different queries that use different means to query the table. This is also why application don't always just let you search by any field. Each additional searchable field requires new indexes, which adds new performance cost to the application.

This is also the reason why it's so important to have a separate and organized database layer in your application. It helps you get a handle on what queries you really have, and therefore what indexes you really need. Good tiered application design, or a well-designed service layer for the service-oriented crowd, is really a performance thing as much as anything else, because database performance often cuts to the core of your larger application performance.

Community
  • 1
  • 1
Joel Coehoorn
  • 399,467
  • 113
  • 570
  • 794
  • I concur. We might also mention that the addition of a composite index on `(last_name,first_name)` would make the index on just `(last_name)` redundant, unless that index on the single column is enforcing uniqueness. Any query that makes effective use of the index on the single column `(last_name)` will be able to make effective use of a composite index that has `last_name` as the leading column. – spencer7593 Nov 01 '15 at 04:02
0

Ok you need to know 2 thing: index are for increase speed of search ( select ) but will slow your changes ( insert/update/delete ) if you need to do a track, try use a table only for collect informations, and athor table to be sintetisez your info about your track. Example:

table track ( ip,date,page,... ) table hour_track ( page,number_visitator,date )

In table track you will only add, no update or delete. Table hour_track you will generate with a cronjob ( or athor thenique ) and there you will add a combinate index ( most_search, secound_most_search, ... ) . Combinated index will increase your speed because your databse need only remake 1 arbores not more, more then that if maiby you need a index for a column because there column is more used for your query you can add there column to be first of your index declaration. You can red more here

Laurentiu
  • 574
  • 6
  • 26