5

I am running an application that is using tables similar to the below tables. There are one tables for articles and there is another table for tags. I want to get the latest 30 articles for a specific tag order by article id. for example "acer", the below query will do the job but it is not indexed correctly because it will scan a lot of rows if there are a lot of articles related to a specific tag. How to run a query to get the same result without scanning a large number of rows?

EXPLAIN SELECT title
FROM tag, article
WHERE tag = 'acer'
AND tag.article_id = article.id
ORDER BY tag.article_id DESC 
LIMIT 0 , 30 

Output

id  select_type     table   type    possible_keys   key     key_len     ref     rows    Extra
1   SIMPLE  tag     ref     tag     tag     92  const   220439  Using where; Using index
1   SIMPLE  article     eq_ref  PRIMARY     PRIMARY     4   testdb.tag.article_id   1 

The flollowing is the tables and sample data:

CREATE TABLE `article` (
  `id` int(11) NOT NULL auto_increment,
  `title` varchar(60) NOT NULL,
  `time_stamp` int(11) NOT NULL,
  PRIMARY KEY  (`id`)
) ENGINE=MyISAM  DEFAULT CHARSET=utf8 AUTO_INCREMENT=1000001 ;

-- 
-- Dumping data for table `article`
-- 

INSERT INTO `article` VALUES (1, 'Saudi Apple type D', 1313390211);
INSERT INTO `article` VALUES (2, 'Japan Apple type A', 1313420771);
INSERT INTO `article` VALUES (3, 'UAE Samsung type B', 1313423082);
INSERT INTO `article` VALUES (4, 'UAE Apple type H', 1313417337);
INSERT INTO `article` VALUES (5, 'Japan Samsung type D', 1313398875);
INSERT INTO `article` VALUES (6, 'UK Acer type B', 1313387888);
INSERT INTO `article` VALUES (7, 'Saudi Sony type D', 1313429416);
INSERT INTO `article` VALUES (8, 'UK Apple type B', 1313394549);
INSERT INTO `article` VALUES (9, 'Japan HP type A', 1313427730);
INSERT INTO `article` VALUES (10, 'Japan Acer type C', 1313400046);



CREATE TABLE `tag` (
  `tag` varchar(30) NOT NULL,
  `article_id` int(11) NOT NULL,
  UNIQUE KEY `tag` (`tag`,`article_id`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8;

-- 
-- Dumping data for table `tag`
-- 


INSERT INTO `tag` VALUES ('Samsung', 1);
INSERT INTO `tag` VALUES ('Acer', 2);
INSERT INTO `tag` VALUES ('Sony', 3);
INSERT INTO `tag` VALUES ('Apple', 4);
INSERT INTO `tag` VALUES ('Acer', 5);
INSERT INTO `tag` VALUES ('HP', 6);
INSERT INTO `tag` VALUES ('Acer', 7);
INSERT INTO `tag` VALUES ('Sony', 7);
INSERT INTO `tag` VALUES ('Acer', 7);
INSERT INTO `tag` VALUES ('Samsung', 9);
usef_ksa
  • 1,669
  • 3
  • 23
  • 40
  • Just wondering, but why are you ordering by tag.article_id instead of article.id? – Gerry Aug 15 '11 at 19:40
  • Are you able to create new tables or modify the schema? – Sean Aug 15 '11 at 19:42
  • @Gerry it is not a must,both of them are the same. it is also ok to sort by time_stamp. – usef_ksa Aug 15 '11 at 19:44
  • @Sean Yes I can change the schema. – usef_ksa Aug 15 '11 at 19:45
  • @usef_ksa Have you managed to find a solution yet? – Gerry Aug 17 '11 at 16:46
  • 1
    @Gerry Actually the query is already not scanning a lot of rows but EXPLAIN is ignoring LIMIT and because of that it showing a lot of rows examined. please read Mr Quassnoi comment below for more info. Thanks a lot – usef_ksa Aug 18 '11 at 01:17
  • Yeah I read that, but cheers for the update (+1). Did this discovery help to solve your problem as I'm curious what your solution was. – Gerry Aug 18 '11 at 14:28

4 Answers4

1

What makes you think the query will examine a large number of rows?

The query will scan exactly 30 records using the UNIQUE index on tag (tag, article_id), join the article to each record on PRIMARY KEY and stop.

This is exactly what your plan says.

I just made this test script:

CREATE TABLE `article` (
  `id` int(11) NOT NULL auto_increment,
  `title` varchar(60) NOT NULL,
  `time_stamp` int(11) NOT NULL,
  PRIMARY KEY  (`id`)
) ENGINE=MyISAM  DEFAULT CHARSET=utf8 AUTO_INCREMENT=1000001 ;

CREATE TABLE `tag` (
  `tag` varchar(30) NOT NULL,
  `article_id` int(11) NOT NULL,
  UNIQUE KEY `tag` (`tag`,`article_id`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8;

INSERT
INTO    article
SELECT  id, CONCAT('Article ', id), UNIX_TIMESTAMP('2011-08-17' - INTERVAL id SECOND)
FROM    t_source;

INSERT
INTO    tag
SELECT  CASE fld WHEN 1 THEN CONCAT('tag', (id - 1) div 10 + 1) ELSE tag END AS tag, id
FROM    (
        SELECT  tag,
                id,
                FIELD(tag, 'Other', 'Acer', 'Sony', 'HP', 'Dell') AS fld,
                RAND(20110817) AS rnd
        FROM    (
                SELECT  'Other' AS tag
                UNION ALL
                SELECT  'Acer' AS tag
                UNION ALL
                SELECT  'Sony' AS tag
                UNION ALL
                SELECT  'HP' AS tag
                UNION ALL
                SELECT  'Dell' AS tag
                ) t
        JOIN    t_source
        ) q
WHERE   POWER(3, -fld) > rnd;

, where t_source is a table with 1M records in it, and run your query:

SELECT  *
FROM    tag t
JOIN    article a
ON      a.id = t.article_id
WHERE   t.tag = 'acer'
ORDER BY
        t.article_id DESC
LIMIT 30;

It was instant.

Quassnoi
  • 413,100
  • 91
  • 616
  • 614
  • I am thinking it scanning a lot of rows because explain shows there are 220439 rows scanned. Is this correct? – usef_ksa Aug 17 '11 at 11:34
  • 1
    @usef_ksa: it's a rough estimation of the number of rows satisfying your `WHERE` condition. It does not depend on the `LIMIT`: just try putting different values into the `LIMIT` clause, the value in `EXPLAIN` won't change. In reality, the query will fetch and examine `30` records from `tag` and join `30` records from `article` to them. – Quassnoi Aug 17 '11 at 12:06
0

try ANSI join syntax:

SELECT title
FROM tag t
INNER JOIN article a
    ON t.article_id = a.id
WHERE
    t.tag = 'acer'
ORDER BY 
    tag.article_id DESC
LIMIT 0 , 30

then put an index on tag.tag. Assuming you have enough selectivity on that table, and article.id is a primary key, that should be pretty zippy.

Jeremy Holovacs
  • 22,480
  • 33
  • 117
  • 254
  • It is the same.still the same problem exists – usef_ksa Aug 15 '11 at 19:53
  • what percentage of tags have a tag of "acer"? If it's more than 5% or so, you're going to be scanning. --Actually, having just read the data you are inserting, you will not be able to avoid a scan. An index seek needs much higher selectivity. – Jeremy Holovacs Aug 15 '11 at 19:54
0

Edit: Add this index

UNIQUE KEY tag (article_id,tag)
Gerry
  • 10,584
  • 4
  • 41
  • 49
  • Also switching to "ORDER BY article.id DESC" might give a slight benefit (or a decent if you don't add the index above, depending on the size of the tag table) although I don't know enough about MYSQL internals to be sure about that. – Gerry Aug 15 '11 at 20:01
  • I added the index but still the same problem. I forced it to use the new index but it is scanning more rows. – usef_ksa Aug 15 '11 at 20:11
  • Did you try UNIQUE KEY tag (article_id,tag)? How many rows is it scanning now and what is the output of Extra? Is it still using where? – Gerry Aug 15 '11 at 20:17
  • Also make the orderby change just in case. – Gerry Aug 15 '11 at 20:19
  • Yes but it still scanning a lot of rows. it is scanning 750K out of 1 million rows.it is scanning less number of rows than the article_id index.This is extra info: Using where; Using index – usef_ksa Aug 15 '11 at 20:34
  • It should only say "Using index" if set up correctly. "Using where;" means the index isn't being used initially. – Gerry Aug 15 '11 at 20:37
  • 1
    I tested it. Actually it must use where because the first part of index will sort the ids and after that sort by the tag.There are some ids have more than one tag. so if removed tag = 'acer' from the query it will use the index only. – usef_ksa Aug 15 '11 at 21:04
0

I would suggest modifying the storage engine and schema to utilize foreign keys.

CREATE TABLE `article` (
  `id` int(11) NOT NULL auto_increment,
  `title` varchar(60) NOT NULL,
  `time_stamp` int(11) NOT NULL,
  PRIMARY KEY  (`id`)
) ENGINE=InnoDB  DEFAULT CHARSET=utf8 AUTO_INCREMENT=1000001 ;

CREATE TABLE `tag` (
 `id` int(11) NOT NULL auto_increment,
 `tag` varchar(30) NOT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

CREATE TABLE `article_tag` (
 `id` int(11) NOT NULL auto_increment,
 `article_id` int(11) NOT NULL,
 `tag_id` int(11) NOT NULL,
 PRIMARY KEY (`id`),
 FOREIGN KEY (`article_id`) REFERENCES article(id),
 FOREIGN KEY (`tag_id`) REFERENCES tag(id)
) ENGINE=Innodb;

Which results in a query like so:

EXPLAIN 
SELECT * FROM article 
    JOIN article_tag ON article.id = article_tag.id 
    JOIN tag ON article_tag.tag_id = tag.id 
WHERE tag.tag="Acer";
+----+-------------+-------------+--------+----------------+---------+---------+-------------------------+------+-------------+
| id | select_type | table       | type   | possible_keys  | key     | key_len | ref                     | rows | Extra       |
+----+-------------+-------------+--------+----------------+---------+---------+-------------------------+------+-------------+
|  1 | SIMPLE      | article_tag | ALL    | PRIMARY,tag_id | NULL    | NULL    | NULL                    |    1 |             |
|  1 | SIMPLE      | tag         | eq_ref | PRIMARY        | PRIMARY | 4       | temp.article_tag.tag_id |    1 | Using where |
|  1 | SIMPLE      | article     | eq_ref | PRIMARY        | PRIMARY | 4       | temp.article_tag.id     |    1 |             |
+----+-------------+-------------+--------+----------------+---------+---------+-------------------------+------+-------------+
3 rows in set (0.00 sec)
Sean
  • 671
  • 5
  • 11
  • I can change the schema but I cannot change the Engine to Innodb. I must use MyISAM. – usef_ksa Aug 15 '11 at 20:14
  • Because currently I am more familiar with MyISAM and also it support full text search and I am using this feature in my application. I believe there is a way to solve this problem with MyISAM. I think Changing the table Engine needs a downtime and a lot of changes. – usef_ksa Aug 15 '11 at 20:48
  • 1
    You do realize that MySQL has switched over to InnoDB as the default storage engine after MySQL 5.5.5 - which was nearly a year ago, right? – Sean Aug 15 '11 at 21:50
  • +1 for this comment.Yes I know that, it seems InnoDB is good choice for the future 'and maybe the current'. I will test my application using InnoDB in future. Now I have to fix this problem using MyISAM because My site is under heavy traffic now. – usef_ksa Aug 16 '11 at 22:23