1

I have a photo gallery on my site with 1M photos in it. There are 2 search tables associated with it. Table #1 contains a list of words used in the photos. Table #2 contains a list of what words match up with what photos. Table #2 is 7M rows. I am testing partitioning this 7M row table because I have another set of tables with 120,000,000 rows. Queries against the 120M row wordmatch table below, with or without a join again the wordlist table below, take multiple seconds to run.

I am trying to perform a join between these 2 tables and MySQL 5.6 EXPLAIN PARTITIONS shows it is using all the partitions. How can I redo this query to make this correctly use only a single partition?

The 2 tables:

CREATE TABLE wordlist (
  word_text varchar(50) NOT NULL DEFAULT '',
  word_id mediumint(8) unsigned NOT NULL AUTO_INCREMENT
  PRIMARY KEY (word_text),
  KEY word_id (word_id)
) ENGINE=InnoDB

CREATE TABLE wordmatch (
  pic_id int(11) unsigned NOT NULL DEFAULT '0',
  word_id mediumint(8) unsigned NOT NULL DEFAULT '0',
  title_match tinyint(1) NOT NULL DEFAULT '0',
  PRIMARY KEY (word_id,pic_id,title_match),
  KEY pic_id (pic_id)
) ENGINE=InnoDB
/*!50100 PARTITION BY HASH (word_id)
PARTITIONS 11 */;

SQL query I am performing:

EXPLAIN PARTITIONS SELECT m.pic_id FROM wordlist w, wordmatch m WHERE w.word_text LIKE 'bacon' AND m.word_id = w.word_id 
+----+-------------+-------+-----------------------------------+-------+-----------------+---------+---------+----------------------------+------+-------------+
| id | select_type | table | partitions                        | type  | possible_keys   | key     | key_len | ref                        | rows | Extra       |
+----+-------------+-------+-----------------------------------+-------+-----------------+---------+---------+----------------------------+------+-------------+
|  1 | SIMPLE      | w     | NULL                              | range | PRIMARY,word_id | PRIMARY | 52      | NULL                       |    1 | Using where |
|  1 | SIMPLE      | m     | p0,p1,p2,p3,p4,p5,p6,p7,p8,p9,p10 | ref   | PRIMARY         | PRIMARY | 3       | w.word_id                  |   34 | Using index |
+----+-------------+-------+-----------------------------------+-------+-----------------+---------+---------+----------------------------+------+-------------+

The join produces a query that uses all partitions. If I retrieve the word_id # first and go straight against the wordmatch table, everything is ok:

EXPLAIN PARTITIONS SELECT m.pic_id FROM wordmatch m WHERE m.word_id = 219657;
+----+-------------+-------+------------+------+---------------+---------+---------+-------+-------+-------------+
| id | select_type | table | partitions | type | possible_keys | key     | key_len | ref   | rows  | Extra       |
+----+-------------+-------+------------+------+---------------+---------+---------+-------+-------+-------------+
|  1 | SIMPLE      | m     | p9         | ref  | PRIMARY       | PRIMARY | 3       | const | 18220 | Using index |
+----+-------------+-------+------------+------+---------------+---------+---------+-------+-------+-------------+

How do I get this to work correctly? I prefer not to split this into multiple queries if possible. You may have noticed I am using LIKE above. People will often search on bacon% to get plurals of words, etc. Example:

SELECT m.pic_id FROM wordlist w, wordmatch m WHERE w.word_text LIKE 'bacon%' AND m.word_id = w.word_id

I realize this wildcard search may result in 2 or more partitions being selected. This is probably ok, although if there is a way to change the partitioning to prevent that, I welcome any tips.

Edit #1: Added details as my original question was confusing. I was testing my 7M row table first before doing my 120M row table.

Edit #2: Resolution to my overall issue: My performance issues seem to be resolved as I partitioned my 120M row table into 101 partitions per this post: MySQL performance: partitions I do not know if MySQL is going against all the partitions at runtime - Ollie Jones says it does not in the comments below and EXPLAIN PARTITIONS is incorrect - but it is fast now so I am happy.

Community
  • 1
  • 1
63bus
  • 31
  • 7
  • Just a remark: I believe seven megarows is not big enough to justify partitioning for a table like the one you describe. You'll carry a more-or-less permanent sysadmin burden and query overhead burden if you partition the table. May I suggest you spend your effort indexing the `wordmatch` table? – O. Jones Nov 04 '14 at 19:24
  • Thank you. What indexes would I change or add? I should have mentioned that I am actually using this table as a test before I do the same for another table, which has 120 million rows. – 63bus Nov 04 '14 at 20:38

2 Answers2

2

To get your query working with efficient indexing is probably a good idea before you dive into the partitioning project. Here's your query refactored to use JOIN:

SELECT m.pic_id 
  FROM wordlist w
  JOIN wordmatch m ON w.word_id = m.word_id
 WHERE w.word_text LIKE 'bacon%' 

This query can use a compound index on wordlist (word_test, word_id). It will random-access the index for the first matching word_text, and then scan the index retrieving the word_id values until it gets to the last matching `word_text.

It can also use your existing primary key on wordmatch (word_id, pic_id) It speeds up your query because the data base engine can satisfy your query directly from the index without having to bat the hard drive back and forth to the table itself.

So, give those indexes a try. Your large table, the wordmatch table, should work fairly well without partitioning. It's more common to partition tables that contain lots of content (like the text of articles) than it is to partition this kind of fixed-row-size join table.

Notice that your EXPLAIN announces it will look at all the partitions because EXPLAIN can't tell which partition (or partitions) your w.word_text LIKE 'bacon%' WHERE-clause will need to examine. EXPLAIN isn't as dumb as a box of hammers, but it is close. MySQL won't examine the partitions it doesn't need to, but it doesn't know which partitions are involved until runtime.

Have you considered using FULLTEXT search? It might simplify what you're doing.

O. Jones
  • 103,626
  • 17
  • 118
  • 172
  • Good answer. Does a better job of explaining alternate options than mines does (especially with recommendation on FULLTEXT index that I only got to in back and forth comments on my answer). This is better answer for posterity. – Mike Brant Nov 04 '14 at 21:14
  • @Ollie Thank you, I altered the index for wordlist. Working with 120M table now - Unfortunately no speed change. Questions: 1. How is this index change different than just `word_text` alone? I don't need the word_ids sorted & a single `word_text` index will already group each word together, right? 2. Your reworked query using `JOIN` is no different than the `,` join I am performing, correct? Yes, I am considering a FULLTEXT search as another option. I was not aware of the EXPLAIN vs. runtime differences so thanks for that too. – 63bus Nov 05 '14 at 16:03
0

Your first query doesn't have any filtering conditions on wordmatch table that could limit the partitions in use, thus it needs to access all partitions. There is no way to redo this query to use only necessary partitions without adding a filter on the field that is the basis for the partitioning (word_id).

The second query filters on a specific word_id value, so the index knows exactly which partition to point to.

I would also agree with comment made by @OllieJones that I am not sure you should really worry about partitioning at only 7M rows. That is not really that big of a table in the grand schema of things.

Mike Brant
  • 70,514
  • 10
  • 99
  • 103
  • Wow, so the query execution planner doesn't know to look up the wordlist.word_id first to know which partition to use for wordmatch.word_id. So the only solution is to split this into multiple queries, correct? I am doing this set of tables as a test before I do the same for another set of tables where wordmatch contains 120 million rows. I will edit the original post. – 63bus Nov 04 '14 at 20:49
  • There are some delays from time to time in retrieving records from the 7M table, perhaps it is not related to the size. I can run a query and it will take 7-8 seconds for the first query but then next one will take .008 seconds, almost like it has to load it into memory or ramp up somehow? I have 9GB assign to INNODB where the INNODB tables are 7G total so it should not be a memory issue. Any advice welcome. – 63bus Nov 04 '14 at 20:49
  • @63bus At the planning stage, the planner doesn't yet know which word_id's would match the criteria used for `w.word_text` so it doesn't know which partitions will be needed. – Mike Brant Nov 04 '14 at 21:00
  • @63bus A lot of this might have to do with your search criteria. It you use something such as `LIKE 'bacon'` (which is really the same as `= 'bacon'`) or `LIKE 'bacon%'` the index on `word_text` can be used. If you use `LIKE '%bacon%'` then you will not be able to use the index at all since index matching starts from beginning of string on such a field. You might be getting slow queries followed by fast queries in that the query hits query cache the second time around. – Mike Brant Nov 04 '14 at 21:04
  • @63bus For this type of problem, you might consider using natural language search. Link to natural language search documentation - http://dev.mysql.com/doc/refman/5.6/en/fulltext-natural-language.html . With this approach, you can use FULLTEXT index which is more well suited for this sort of query. – Mike Brant Nov 04 '14 at 21:05
  • Thank you. Yes, I am considering a FULLTEXT search as another option. I am not doing any `%bacon%` type searches but this is a good warning to mention, I actually disabled that due to speed issues. I am not using the query cache but I agree this would cause a speed increase after the 1st query. – 63bus Nov 05 '14 at 15:50
  • @63bus If you are not doing wildcard searches, that why are you not simply using `= 'bacon'` instead of like. If you are not searching for words amongst larger text but rather just making exact matches, then FULLTEXT may not be useful to you after all. – Mike Brant Nov 05 '14 at 17:39
  • Users are doing `bacon%` type searches, just no leading `%` characters. – 63bus Nov 05 '14 at 18:22