2

I am confused as to how best to index a table in MySQL and need help on the best type of index construction to use. Currently I am using a unique-key index on this table but do not know if this is the best approach to use and in some situations I cannot use this type of indexing due to MySQL limitations.

The table consists of a primary key and n-columns, in this scenario to keep it simple n=4. So the table looks like this: pk, col1, col2, col3, col4

The values in col1-n are VARCHARs typically with a length between 1 to 4 characters. The primary key is a concatenation of the col values. So typical rows could look like the following:

A:B:C:D, A, B, C, D
A:B:C:E, A, B, C, E
A:B:F:F, A, B, F, F

Where the first element is the primary key, and subsequent elements are col1, col2, etc.

The table needs to be optimised for queries, not inserts. The queries that I wish to perform will have a WHERE clause where we know some of the values in columns 1-4. So for example I might want to find all rows where the second column is 'B' or 'C'. Once I have the primary key I use this to JOIN another table.

I was creating a unique key on col1-4 (as they are unique). The problem is, as soon as n becomes large (>16), I can no longer create a unique key index (MySQL is limited to 16 columns for unique key constraints). This is not a problem as the primary key ensures uniqueness. However, I am unsure of two things:

a) Is the unique key a good index to use in order to optimise the speed of the queries?

b) When I can not use a unique key, what index should I use?

I have the following options, and I’m not sure which (if any) is the best:

a) Create a single index on (col1, col2, col3, col4)

b) Create an index per column (col1), (col2)…(col-n)

c) Create an index per col, with the pk included (pk, col1), (pk, col2), (pk, col-n)

Any help you can provide is greatly appreciated.

Thanks

Phil

Phil
  • 1,897
  • 4
  • 25
  • 48
  • I'd always recommend to use an integer as your PK. why not add an ID column? – Fr0zenFyr Apr 03 '13 at 12:37
  • In production environment, prefer to use an AI (auto-incremented) integer for primary. – JoDev Apr 03 '13 at 12:37
  • @Fr0zenFyr why use an arbitrary ID if there's a meaningful one available? – Sepster Apr 03 '13 at 12:39
  • @JoDev OP stated insert performance not an issue... So what benefit is an auto increment ID in this scenario? – Sepster Apr 03 '13 at 12:41
  • From my very brief scan of the mySql index docco, it looks as though a LIKE against your PK (using wild cards for unknown 'columns' in the concatenation, so long as first char is not a wildcard) might perform reasonably well (at least, there's no explicit warnings suggesting string indexes don't perform well). Hence the other columns may not require indexes at all. But that doesn't _fit_ with my expectations generally so wouldn't say I'm _recommending_ that... But might be worth testing in your specific circumstance? – Sepster Apr 03 '13 at 12:48
  • 1
    @Sepster: In this scenario, his INT auto-inc will act as a primary key and the existing VARCHAR column may be used as the unique key. Look at the data and you would know that in a worst case, if he ever had to update any column, the primary key must be changed. An auto-inc INT pk doesn't mean anything and doest need to be updated. He mentioned that he will use the pk for JOIN queries with another table where he needs to have a an fk which will be long, I'd say that an INT will do less I/O during the queries are executed because these will consume less space. – Fr0zenFyr Apr 03 '13 at 13:01
  • @Fr0zenFyr but OP needs to maintain a concatenation of Cols 1..n _somewhere_, and that will always need to be updated (along with any underlying index), so unless the PK is clustered (_is it_, in mySql?) I don't see the advantage. But OP stated insert performance not an issue so that is moot. As for JOIN on Int vs. Chars, I'd be very surprised if column data (or rather, index data) is read from disk row-by-row (or rather, node-by-node) during a join such that disk IO wasn't comparable under both scenarios... but if you have it I would be interested in an authoritative link. – Sepster Apr 03 '13 at 13:30
  • @Fr0zenFyr refer http://stackoverflow.com/questions/332300/is-there-a-real-performance-difference-between-int-and-varchar-primary-keys for some different POV. Arguments for and against, but certainly nothing concrete either way, such that I think it's valid to not automatically assume an INT PK is required (which is not to say it's not required, either!). – Sepster Apr 03 '13 at 13:36

2 Answers2

2

An index on (col1, col2, col3, col4) can only be used, if the WHERE clause contains a condition on the first columns. That means, if the query does not contain a condition on col1, the index cannot be used at all (see Multiple-Column Indexes). If you have such queries, additional indices should be defined. These might be (col2, col3, col4), (col3, col4) and (col4).

On the other hand, separate indices on (col1), (col2), (col3) and (col4) are also a good choice. Int that case, there is no need to include the primary key in the indices. I'd prefer this solution over the solution mentioned above.

I find your choice of primary key strange. If (col1, col2, col3, col4) is unique, use that as a primary key. If you do not want a primary key on four columns (most people don't), the next choice is often a surrogate key (i.e. an auto_increment column in MySQL). In that case, a unique key on (col1, col2, col3, col4) enforces data integrity.

Oswald
  • 31,254
  • 3
  • 43
  • 68
  • The point of concatenated PK, I think, was because when eg n gets large, a compound key becomes impractical, yet the PK remains "searchable". Regardless of that, does this answer still apply to the OPs scenario where n>16? – Sepster Apr 03 '13 at 12:53
  • I use the primary key column, as my query response requires the concatenated key. As this was guaranteed to be unique I did not see the need for a separate auto-increment column. I do use the Primary Key to JOIN to another table, so if this design slows the query in any way, then perhaps I should change it. I could make the current Primary Key a unique key, and have an auto inc pk. And then index my other columns individually in the way you suggested. – Phil Apr 03 '13 at 12:57
  • @Phil: there is definitely a performance issue due to the I/O on VARCHAR pk. It is alright if the size of VARCHAR is small. ;) – Fr0zenFyr Apr 03 '13 at 13:13
  • @Oswald Thanks for taking the time to answer this. I'll use a separate indices. – Phil Apr 03 '13 at 16:52
0

MySQL is able to merge join several indexes within a single table on PK, as long as you are searching for exact key values (not ranges).

So if you create separate indexes on col1 to colN, you may run this query:

SELECT  *
FROM    mytable
WHERE   col2 = 'B'
        OR
        col3 = 'C'

which will result in the indexes on col2 and col3 merge joined (you will see it as index_merge using union(col2, col3) in the EXPLAIN output).

To ensure uniqueness, it's enough to declare your first column the PRIMARY KEY, so as long as you maintain your data consistency (PK value is indeed the col* values concatenated and separated), your data uniqueness will be policed by the PK.

Quassnoi
  • 413,100
  • 91
  • 616
  • 614
  • Not about this VARCHAR pk situation but I remember running into a situation where I had too much of junk(data duplicity) in DB with no unique index and just a pk(INT) for indexing. I couldn't really find a way to maintain consistency of data in my development environment(lucky me!). – Fr0zenFyr Apr 03 '13 at 13:09