1

I know there are many Q's about this issue.
I have read many of them.
I know that for performance in general it is better to use the high cardinality fields (most selective) first in the index.

having said that, I want to branch of a comment that was made in: Order of columns in a multi-column index in MySQL and make it an actual Q.

say that I save country id (low cardinality) and person id (high cardinality).
same person id can exist in several countries.

I sometimes need to retrieve all people from a country, and sometimes need to reach a specific person from a specific country.

For the country query I need an index on country or on that starts with country. but what about the second query?

Should I use a single index with country_id,person_id (which is sort of "wrong order of fields" cardinality wise) or add another index person_id,country_id and thus have two indexes - one for each of those queries?

How would that change if:
1) I also needed sometimes to reach all the people with the same person_id (regardless of country) ?
2) instead of country I had a hierarchy of continent,country,region,city,person ?

I don't know how much does it matter but if it does - assume the Q is about MySQL.

UPDATE 1
I am not sure I was clear enough...

Because I said that I will need to ask about all the people in a country ( i.e. ... where country_id=? ) it is obvious that I should have an index on country...

When I need to query on both country and person I can create a query like: ... where country_id=? and person_id=? and just extend the index to be (country_id,user_id) and use it for both queries, or I could leave the country index alone (for ... where country_id=? queries) and add a new (user_id,country_id) and write my two field queries as ... where person_id=? and country_id=? which is more efficient.

so option 1: single but less efficient (country_id,user_id) index querying for ... where country_id=? and person_id=? or ... where country_id=?

option 2: two indexes (country_id) index and (person_id,country_id) index for ... where country_id=? and ... where person_id=? and country_id=?

Having written this like this I also recognize that the second option also provides an answer to 1st Q variant where a ...where person_id=? query is also needed...

but still for the basic scenario - would you go with option 1 or 2 ... ?

Community
  • 1
  • 1
epeleg
  • 10,347
  • 17
  • 101
  • 151
  • What would the other join column be in that situation (change #2)? In general, no, the actual RDBMS doesn't matter too much here - the issue will affect all of them. Probably, you _will_ want both indices. Many optimizers will spit out information about desired indices - does it mention it? DBAs should monitor which indices are recommended/used and add/remove as appropriate. – Clockwork-Muse Apr 27 '14 at 10:01
  • re #2 consider a non normalized table that just has the said fields... – epeleg Apr 28 '14 at 11:07
  • 1
    The fact that you'd have all those columns in one table doesn't really change anything - you'd just have the indices cover the needed columns (the wider an index gets, the more specific it gets, and the less widely useful it is). If you write a query for everyone in a specific country, you (usually) want an index starting with `country_id`. If you write a query for everybody with the same `person_id`, you (usually) want an index starting on that column. You can probably get better help on [dba.se], especially on indexing strategy. – Clockwork-Muse Apr 28 '14 at 11:21
  • thanks for the link to DBA SE. I did not know it existed... – epeleg Apr 28 '14 at 12:19

1 Answers1

0

You can use a single index "country_id, person_id" only if you know the country id for both queries. MySQL will not use the index if you are only querying on person_id. An explaination of that is here: http://www.ovaistariq.net/17/mysql-indexes-multi-column-indexes-and-order-of-columns/

1) In the cases of a person_id only query a "country_id, person_id" index could not be used.

2) In the case of this type of query you could only use a index starting with continent, if you always were querying on this (otherwise the index would be ignored).

If you meet the above criteria such that one index would actually work for both queries, then I think it's fine to just have one - with the most selective first (assuming that applies to your queries).

If you do need two, I would only include in the index the fields that will be used (i.e. they are in your WHERE, ORDER BY or JOIN clauses). The cost of having unnecessary columns is slower updates and more memory (and disk) used for the index.

UPDATE 1

For the basic scenario I would go with option 1 as will be more space efficient which will save on the memory used for indexes, freeing it up for other indexes or page cache. If you did need a ... where person_id=? style query, option 2 would be a good solution to that.