4

Here is what I have :

table content : cat_id product_id data1 data2 etc.
the categories are not unique obviously. the product ids are unique.

2 queries : 1 -- SELECT * WHERE cat_id = :cat - must be as quick as possible 2 -- SELECT * WHERE product_id = :prodId In second select, I can add : AND cat_id = :cat

What is the more efficient ?

  • 1 - index (not unique) on cat_id (good for select 1)
  • 2 - primary key on product_id (unique -> excellent for select 2)
  • 3 - index (not unique) on cat_id + PK on product_id (good for 1 & 2 separately)
  • 4 - unique constraint with composite [cat_id+product_id] (good for 1 & 2 together)
  • 5 - same as 4, but defining the composite as PK
  • 6 - composite (4 or 5) + single index/PK

For information, I'll have around 20 products in each category and a lot of categories (say 3000) - And (as it is unique in table) ONE product belongs to only ONE category - In fact, that is not really cats and products, that is for the simplicity of explaination;)

thanks!

Rémi Levassor
  • 151
  • 1
  • 8
  • Just a heads up in option 4: If I recall correctly, MySQL can use the first parts of a composite key. I.e.: if you define your keys as in option 4 and then you build a query with "WHERE cat_id=...", MySQL is able to use the index (not so with "WHERE product_id=..."). http://dev.mysql.com/doc/refman/5.7/en/multiple-column-indexes.html – Sebastianb Apr 29 '16 at 14:02
  • @Sebastianb - Thanks, that is the kind of things I don't know : how are created -and later used- composite keys by the engine. So, say in phpMyAdmin, the ORDER in wich we set the rows when defining a composite index is important? [cat_id+product_id] != [product_id+cat_id]? I added 5th option. adding the composite as PK is ever better, no? – Rémi Levassor Apr 29 '16 at 14:11
  • As long as you have NOT NULL cats and products, you can just PK the composite key and I guess it would be fine (check here for differences between unique indexes and PK's: http://stackoverflow.com/questions/487314/primary-key-or-unique-index). However, if you want to find a product_id you won't be using the composite index, so I guess you would be better of just declaring product_id as PK and cat_id as index. – Sebastianb Apr 29 '16 at 14:56

2 Answers2

5

A database without Primary key is only half dressed and according to you product_id is an ideal candidate for a primary key, so let choose that. The primary key will be used in

SELECT * WHERE product_id = :prodId

It does not matter if and cat_id = :cat_id becomes a part of the query or not unless you have thousands of cat_ids associated with each product_id.

Then choose an index on cat_id. This will be used on

SELECT * WHERE cat_id = :cat

This will be very quick if the cardinality of the data is good. That means there is a wide distribution of cat_ids in the table. The index of cat_id will not be used in the first query. Thus you have two different indexes and both queries can be expected to be really quick.

Since [cat_id+product_id] != [product_id+cat_id] when it comes to indexing, if you have only one composite index one or the other will be slow.

For example, suppose we had a composite index on (cat_id, product_id) now the following query cannot make use of this index.

 SELECT * FROM tablename WHERE product_id = :prodId

But both these queries can use the (cat_id, product_id) index

SELECT * FROM tablename WHERE cat_id = :cat_id and product_id = :prodId
SELECT * FROM tablename WHERE cat_id = :catId

So in summary. Choose 1 and 2. But if the number of cat_ids are small or there are lots of cat_ids associated with each product_id choose 4 but make sure that the primary key is also in place.

e4c5
  • 52,766
  • 11
  • 101
  • 134
  • Thanks! I am currently trying to understand it all:) For information, I'll have around 20 products in each category and a lot of categories (say 3000) - And ONE product belongs to only ONE category - In fact, that is not *really* cats and products, that was for the simplicity of explaination;) – Rémi Levassor Apr 29 '16 at 14:25
  • So in other words only about 600,000 rows per table at most? You probably don't need the composite index at all for this quantity of data. – e4c5 Apr 29 '16 at 14:31
  • 1
    To give you an idea, 600,000 rows with both columns being ints will only be 480k in size. mysql might not use an index at all here because that amount of data can be read instantly – e4c5 Apr 29 '16 at 14:35
  • ahh who knows? The 3000 may increase if the site is working well:) even for 200000 rows, you keep on 1+2 ? And what about 5th option defined in this order : PK using [cat_id+product_id] -and- I make my 2nd query like this : SELECT * WHERE cat_id = :cat AND product_id = :prodId --- note : I don't know either : Is the order IN the WHERE close is important (I'm using pdo) ? – Rémi Levassor Apr 29 '16 at 14:39
  • Thanks for the idea of size being used. This is "premature optimisation" I know:) but it's mainly for trying to do the better way possible – Rémi Levassor Apr 29 '16 at 14:43
  • at 200,000 rows, if the queries are slow you would need to run 'explain select ... ' to see what indexes are really being used. But I would still stick to those two. The order of columns in the and condition does not matter. It matters only when you use a single column in the where clause. The query planner starts matching the index from the left. – e4c5 Apr 29 '16 at 14:45
  • thanks for all! I just not quite undestand :"It matters only when you use a single column in the where clause" - if single, there is no "order"? - ahh ok you're saying : PHYSICAL order in table is relevant only if SQL call has only one WHERE clause... – Rémi Levassor Apr 29 '16 at 14:55
  • Keep in mind that if you use this table in several joins, the indexing might actually make a diference: I had a table with less than 200 entries, but a bad use of indexes made a query with several joins take seconds instead of ms. – Sebastianb Apr 29 '16 at 14:58
  • Physical ordering of the table matters only if you have an ORDER BY clause in your query. I will update my answer to explain more about the composite indexes. – e4c5 Apr 29 '16 at 15:03
  • yes @Sebastianb agreed. some times it might be so, but it also depends a great deal on the type of join and 'what side' the table in question is on and also whether the where cllause refers to any of it's columns or not. – e4c5 Apr 29 '16 at 15:06
  • last last question : According to your answers I choose options 1+2 --> my 2nd query is better with *WHERE cat AND product_id* than only with *WHERE product_id* --- right? – Rémi Levassor Apr 29 '16 at 15:19
  • 1
    Might not always be true. Mysql is not good at using two indexes per table. So this is why i said if there are many thousands of matching cats or product you might need a composite index. Even if you do a composite index, you would still want to keep the primary key because you will eventually want to do a JOIN on it and as @sebastianb pointed out that might be slow if you don't have an index. to be more specific, primary keys are usually the fields that you join on. – e4c5 Apr 29 '16 at 15:24
  • silly me! of course *WHERE product_id* is sufficient because option 2 make it product_id unique... – Rémi Levassor Apr 29 '16 at 15:36
  • @e4c5 I re read your post. When you say (twice) : "lots/thousands of cat_ids associated with each product_id", didn't you really mean : "lots/thousand of PRODUCT_ids associated with each CAT_id" instead? Because in my table, for example : cat 3 can hold product 3, 7, 56, 5, 987 and 9. But these products ONLY belong to this cat 3. – Rémi Levassor Sep 26 '16 at 09:37
  • that maybe what I meant but I can't remember now :-) – e4c5 Sep 26 '16 at 09:39
1

If these are your only two queries:

SELECT * FROM tablename WHERE cat_id = :cat_id and product_id = :prodId
SELECT * FROM tablename WHERE cat_id = :cat_id

and you have some other way to assure that product_id is UNIQUE, then you need only:

PRIMARY KEY(cat_id, product_id)

It is optimal for both SELECTs.

It is better than INDEX(cat_id) because (1) secondary keys have to finish their work with a PK lookup, and (2) all the cat rows are adjacent, thereby more efficient.

If product_id is actually an AUTO_INCREMENT, then add

INDEX(product_id)

No, you don't need to say UNIQUE (unless you are prone to deliberately trying to insert duplicate product_ids). The only thing that AI requires is that the id be first in some index so that it can do the equivalent of SELECT max(id) whenever mysqld is restarted.

My recommendations apply regardless of the size of the table.

The order of clauses in WHERE does not matter.

JOINs do not require anything in particular. It is slightly more efficient to JOIN on a PRIMARY KEY than on a secondary key, which is a lot more efficient (but still possible) than on a non-indexed column(s).

Rick James
  • 135,179
  • 13
  • 127
  • 222
  • My second query (in my original question) can use cat_id in the WHERE clause. So, using "WHERE cat_id = :cat_id AND product_id = :prodId" is possible. As you said, the 5th OPTION seems to fill all expectations (even for JOINs). But just to be sure about "adjacent" and AutoIncrement : my table can look like (cat/prod) : 2/27, 4/34, 1/37, 2/12, 5/17, 2/68... --> cats are NOT adjacents and products are NOT AI... do you still recommend OPTION 5 ? – Rémi Levassor Sep 26 '16 at 09:49
  • With `ENGINE=InnoDB`, the rows will be stored in this order: 1/37, 2/17, 2/27, 2/68, 4/34, 5/17. And, since it is efficient to linearly scan a portion of the table, you get some advantage from "clustering"/"adjacency". If you also have an auto_increment, that messes up things. You have a good unique key, there is not need to also have an auto_increment. – Rick James Sep 26 '16 at 18:54
  • The _data_ (in InnoDB) is stored in PK order. Any new `INSERT` will locate the proper (based on PK) place in the data to store the new row. That is, you pay some cost during `INSERT`, and reap a benefit during `SELECT`. – Rick James Sep 26 '16 at 18:55