0

I have this table :

CREATE TABLE `villes_france` (
  `code_postal` varchar(10) NOT NULL DEFAULT '',
  `code_insee` varchar(10) DEFAULT NULL,
  `ville` varchar(255) DEFAULT NULL,
  `region_rsi` varchar(2) DEFAULT NULL
) ENGINE=MyISAM DEFAULT CHARSET=latin1 ROW_FORMAT=DYNAMIC;

It contains 36826 rows, one for each french city. The interesting fields here are code_postal (zipcode) and ville (city).

I use it mainly for autocompletion : when something in written in one input, both input are filled.

Here is my query :

if($source == 'cp'){
    $searchSQL = "SELECT code_postal as cp, ville FROM villes_france
                  WHERE code_postal LIKE :cp LIMIT 20";
}else{
    $searchSQL = "SELECT code_postal as cp, ville FROM villes_france
                  WHERE ville LIKE :ville LIMIT 20";
}

(FYI a "%" is appended to :cp or :ville)

Neither of these field is unique (french cities can share a same zipcode) This table will be almost never updated, but a lot of select will be performed.

How would you index this table ?

I have read a lot about indexes but I would like some insight before using them for real.

Rick James
  • 135,179
  • 13
  • 127
  • 222
FLX
  • 2,626
  • 4
  • 26
  • 57
  • I think your problem is related to this http://stackoverflow.com/questions/3695768/table-with-80-million-records-and-adding-an-index-takes-more-than-18-hours-or-f – erfjax Feb 03 '16 at 15:41

2 Answers2

1

Initial choice is

INDEX(code_postal),
INDEX(ville)

One works for one query; the other for the other.

A better choice would be to provide a 'covering' index for each query:

INDEX(code_postal, ville),
INDEX(ville, code_postal)

Yet another cookbook on indexing.

Suggest you not do the lookup until the user has typed at least 2 characters; there is no use showing the first 20 out of thousands of choices.

You did not specify a COLLATION on the column/table, so you probably have latin1_swedish_ci? It may actually be best for your French application since E=e=È=É=Ê=Ë=è=é=ê=ë. (latin1_general_ci treats things as separate: E=e < È=è < É=é < Ê=ê < Ë=ë .) Of note, the swedish collation includes N=n=Ñ=ñ.

Rick James
  • 135,179
  • 13
  • 127
  • 222
  • Yes it's latin1_swedish_ci. I didn't know about this, it's very interesting. It explains maybe why my file was not utf-8 from the start. What if I add a third field and use the 3 with LEFT JOIN ? Should I add an index for every combination of the 3 fields ? Thanks – FLX Feb 04 '16 at 08:54
  • `latin1` is fine for Western European languages. `utf8` would also work, and becomes a requirement as you move east. – Rick James Feb 04 '16 at 16:32
  • If you have 3 fields, then, yes, 3 indexes (not 6). Each of the 3 would start with a field that the `WHERE` clause is using; the other two fields merely tag along because you will need them. Don't go past, say, 4. – Rick James Feb 04 '16 at 16:34
  • Thanks for your valuable advice. Helps a lot. Your cookbook is also interesting – FLX Feb 04 '16 at 16:46
  • The cookbook saves me a lot of typing -- I half answered your question, then pointed you to the Cookbook in hopes that it will explain my answer and help you "next time". – Rick James Feb 04 '16 at 16:48
0

To determine what indexes to create, you should analyse all use cases, not just one.

If you are happy to optimise this lookup functionality only, then create single column index on both code_postal and ville fields. There is no point to create a composite index, since mysql can only use a composite index for a single field lookup, if that single field is the leftmost in that index.

Since this is a relatively small table, which rarely gets updated, I would not hesitate to play around with indexes. Use mysql's explain select ... command to confirm if a newly created index is indeed used by a query.

Shadow
  • 33,525
  • 10
  • 51
  • 64