Indexing a column having duplicate values

Question

Consider these tables:

seed (seedid, seedname) # PK-(seedid) 
stock (seedid, stk, storeid)  # PK-(seedid,storeid), FK-(storeid)
#InnoDB

stock: 
      seedid, stk, storeid
         1    12     81
         2    13     81
         3    14     81

         1    12     82
         2    11     82
         3    13     82
         4    12     82

Query -> select stk from stock where seedid = 'aaa' and storeid = 'yyy'.

Table stock represent stocks of several stores, hence storeid will be repeated over.

How do I index, table stock, given that it will be queried frequently using storeid ?

Primary keys as automatically indexed, and since (seedid,storeid) is already a Primary key, is there no need to index it further?

You either know you need an index. Or you wait until you know. However `storeid` is a foreign key and must be indexed. — Paul Spiegel, Feb 20 '19 at 16:01
update your question and add the tables schema .. the main query you think is involved ,, — ScaisEdge, Feb 20 '19 at 16:03
@PaulSpiegel: Thanks for your reply ! I forgot to mention that `seedid` per store will be around 65000, so `storeid` will be repeating that many times. `storeid` will be around 12000. Should I go ahead and index it? — lmao, Feb 20 '19 at 16:16
@PaulSpiegel: Today itself I read your post [https://stackoverflow.com/questions/51429997/which-approach-is-faster-for-getting-all-pois-from-mysql-mariadb-with-php-larave]. Thanks for this wonderful post. But, I couldn't ask you anything there, since I do not have enough reputation. Luckily, you are here ! Just one question What is the unit of distance returned, I guess it was meters... — lmao, Feb 20 '19 at 16:44
@lmao see my comment [here](https://stackoverflow.com/a/51469148/5563083) — Paul Spiegel, Feb 20 '19 at 23:44
@Strawberry : For your eyes...see edit. God of ZEROES might not like this discrimination against `0`'s :) — lmao, Feb 21 '19 at 12:33
@PaulSpiegel - Thank you sir, thanks a ton for your clarification. — lmao, Feb 21 '19 at 12:34

spencer7593 · Accepted Answer · 2019-02-20T21:27:10.623

Based on the incomplete specification, I would do this:

   CREATE UNIQUE INDEX stock_UX1 ON stock (storeid,seedid,stk)

This index would satisfy the requirement for an index with storeid as the leading column. (And we know will have that requirement if this is InnoDB and storeid is a foreign key.)

With such a short table row, I'd go ahead and make it a covering index, and include all of the columns. Then queries can be satisfied directly from the index pages without lookups to data pages in the underlying table.

Since we know that (seedid,storeid) is unique (given as the PRIMARY KEY), we know (storeid,seedid) is also unique, so we might as well declare the index to be UNIQUE.

There are other choices; we don't have to create that index above. We could just do this instead:

   CREATE INDEX stock_IX2 ON stock (storeid)

But that will use nearly the same amount of space, and not be as beneficial to as many possible queries.

The secondary index will contain the primary key of the table; so that second index will include the seedid column, given the PRIMARY KEY of the table. That is, the index is equivalent to this:

   CREATE INDEX stock_IX3 ON stock (storeid,seedid)

And we know the combination of those two columns is unique, so we can include the UNIQUE keyword

   CREATE UNIQUE INDEX stock_UX4 ON stock (storeid,seedid)

If we do an EXPLAIN on a query of the form

  EXPLAIN
  SELECT t.storeid
       , t.seedid
       , t.stk 
    FROM stock t 
   WHERE t.storeid = 'foo'

we are likely to see a range scan operation on the secondary index; but retrieving the value of stk column will require lookup to the data pages in the underlying table. Including the stk column in the secondary index will make the index a covering index for the query. With the index first recommended in the answer, we expect the EXPLAIN output to show "Using index".

I'll be going with your 2nd choice, as I already have an index as Primary Key`(seedid,storeid)` and Foreign Key`(storeid)` would be indexed as well. — lmao, Feb 20 '19 at 16:37
The second will use nearly as much space as the first; since the secondary index has to store the values of the PRIMARY KEY columns; all secondary indexes include the value of the `seedid` column. Omitting `seedid` column from the index definition doesn't save any space, it doesn't make the index smaller or faster. The *order* of the leading columns in the index is important. The second index definition is valid. But it's an indication that the creator of the index doesn't appreciate the details of InnoDB indexes, or the nuances of indexing strategy. — spencer7593, Feb 20 '19 at 21:14

score 1 · Answer 2 · answered Feb 20 '19 at 16:03

1

If seedid,storeid is set as a primary key, then it has a unique index already. However, queries can only use one index at a time. So if you are making regular queries for say something like 'where seedid=3 and storeid=5', that index would be used. But if you just use 'where storeid=5' it might not use the index. (You can use 'explain' in front of your query to see what indexes mysql is going to use.)

So basically you want an index that covers the columns that you need to query regularly in your where clause.

answered Feb 20 '19 at 16:03

SilicaGel

459
3
11

I guess primary key as the index is the only option, given the query I have here. – lmao Feb 20 '19 at 16:33
Thanks for your advice, Sir. – lmao Feb 21 '19 at 12:46

Indexing a column having duplicate values

2 Answers2

Linked