2

I have several queries that could use some optimization since they take a considerable amount of time, so after reading many of the posts on this site I started to modify my schema and adding/changing indexes to appropriately speed up queries. In most cases I had great success, however in this particular case I'm stuck and I'm not sure what I'm doing wrong.

I have a table that has about 3.5 million rows in it.

select count(*) from post;
+----------+
| count(*) |
+----------+
|  3652904 |
+----------+

and is created as follows:

Create Table: CREATE TABLE `post` (
  `id` varchar(255) NOT NULL DEFAULT '',
  `page_id` bigint(20) NOT NULL DEFAULT '0',
  `post_id` bigint(20) NOT NULL DEFAULT '0',
  `type` varchar(45) CHARACTER SET latin1 NOT NULL,
  ...
  `created_time` timestamp NULL DEFAULT NULL,
  `updated_time` timestamp NULL DEFAULT NULL,
  `timestamp` timestamp NULL DEFAULT NULL,
  PRIMARY KEY (`page_id`,`post_id`),
  KEY `created_time` (`created_time`),
  KEY `target_id` (`target_id`),
  KEY `id` (`id`) USING BTREE
) ENGINE=TokuDB DEFAULT CHARSET=utf8
/*!50100 PARTITION BY HASH (page_id)
PARTITIONS 10 */

and another table that has very few rows and is created as follow:

select count(*) from privacy;
+----------+
| count(*) |
+----------+
|    19093 |
+----------+

Create Table: CREATE TABLE `privacy` (
  `id` varchar(255) CHARACTER SET latin1 NOT NULL,
  `page_id` bigint(20) DEFAULT '0',
  `description` text CHARACTER SET latin1,
  `value` varchar(255) CHARACTER SET latin1 DEFAULT NULL,
  `allow` varchar(255) CHARACTER SET latin1 DEFAULT NULL,
  `deny` varchar(255) CHARACTER SET latin1 DEFAULT NULL,
  `json` text CHARACTER SET latin1,
  `timestamp` timestamp NULL DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `page_id` (`page_id`)
) ENGINE=TokuDB DEFAULT CHARSET=utf8

The select that I'm trying to optimize is as follow:

explain partitions 
SELECT   
  post.id,  post.type,  privacy.description  
FROM   
  post 
LEFT JOIN privacy ON privacy.id = post.id  
WHERE post.page_id = 12854644836;
+----+-------------+---------+------------+------+---------------+---------+---------+-------+-------+-------+
| id | select_type | table   | partitions | type | possible_keys | key     | key_len | ref   | rows  | Extra |
+----+-------------+---------+------------+------+---------------+---------+---------+-------+-------+-------+
|  1 | SIMPLE      | post    | p6         | ref  | PRIMARY       | PRIMARY | 8       | const | 34685 |       |
|  1 | SIMPLE      | privacy | NULL       | ALL  | NULL          | NULL    | NULL    | NULL  | 19093 |       |
+----+-------------+---------+------------+------+---------------+---------+---------+-------+-------+-------+

Unfortunately this select is taking several minutes and I can't figure out why. I noticed that the explain doesn't pick up the primary key and I wonder if this is due to the difference in charset between the two tables. Even so from the explain there should not be that many rows involved, and yet it still takes that long to execute.

16754 rows in set (5 min 33.68 sec)

Selects from the tables alone are pretty fast.

select * from post where page_id = 12854644836;
16754 rows in set (0.22 sec)

and

select * from privacy where page_id = 12854644836;
234 rows in set (0.01 sec)

Can some of the MySQL Gurus on this site point me in the right direction? Thanks :)

tmcallaghan
  • 1,292
  • 2
  • 10
  • 20

2 Answers2

2

This is because the id column is defined as latin1 in the privacy table and utf8 in the posts table.

MySQL has to do a charset conversion on the id column for the join, so it can't use an index. Change the charset and it will fix the issues.

Gavin Towey
  • 3,132
  • 15
  • 11
0

MySQL can't use multiple indexes for the same table reference within a single query. You will need to build a composite index on the posts table over (id,page_id) as both columns need to be searched for this query (the first for the join, the second for the filter criterion).

eggyal
  • 122,705
  • 18
  • 212
  • 237
  • Thank you that was very useful as well. I have now changed most my indexes on other queries to be composite and I can see speed improvements across the board. – user1509366 Jul 08 '12 at 02:33
  • You should also consider making this index clustering (or adding type to it to make it a covering index). You haven't included your entire schema for the post table so a clustering index may be too much for your needs. – tmcallaghan Jul 19 '12 at 17:18