16

Here are my tables:

CREATE TABLE `articles` (
    `id` int(10) unsigned not null auto_increment,
    `author_id` int(10) unsigned not null,
    `date_created` datetime not null,
    PRIMARY KEY(id)
) ENGINE=InnoDB;

CREATE TABLE `article_contents` (
    `article_id` int(10) unsigned not null,
    `title` varchar(100) not null,
    `content` text not null,
    PRIMARY KEY(article_id)
) ENGINE=InnoDB;

CREATE TABLE `article_images` (
    `article_id` int(10) unsigned not null,
    `filename` varchar(100) not null,
    `date_added` datetime not null,
    UNIQUE INDEX(article_id, filename)
) ENGINE=InnoDB;

Every article can have one or more images associated with it. I'd like to display the last 40 written articles on a page, along with the most recent image associated with the article. What I can't figure out is how to join with the article_images table, and only retrieve a single row.

Edit: It's important that the solution performs well. The solutions I've seen so far -- which use derived tables -- take a minute or more to complete.

mellowsoon
  • 22,273
  • 19
  • 57
  • 75
  • 2
    Why do you have separate ```articles``` and ```article_contents``` tables? It seems to be a 1-to-1 relationship, aka, a record. Why not move `title` and `content` to the ```articles``` table? If you need to support empty articles, create a default value for `title` and `content` of `''`. – Bacon Bits May 16 '11 at 03:58
  • What you need is a better indexing scheme along with a query that has only 40 rows in the deepest subquery. My answer provides both. I had to actually generate my own sample data using stored procedures. – RolandoMySQLDBA May 17 '11 at 12:01
  • I just added the date_added field to the query to show the date_added of the latest image for each article. Query and output are at the bottom. – RolandoMySQLDBA May 17 '11 at 19:43
  • 1
    @Bacon Bits - I've broken up the table for performance reasons, although I admin my thinking on how MySQL's query cache -- and the OS's cache -- works may be incorrect. The idea is keep all the columns that are used in the WHERE clauses, along with the columns used in the GROUP BY and ORDER BY clauses all in a small table, that only contains numeric and datetime columns. The hope is this table will be small enough to fit completely in memory, while the much larger article_contents table can sit on disk. – mellowsoon May 18 '11 at 17:55
  • Your question has caused some kind of buzz. The question has presented a challenge worthy of such. Even after you accepted my answer, another answer was still submitted. I guess query speed and table design are what developers should be seeing and those who are passionate enough to improve things will certainly try. +1 on your question for inspiring developers to think along these lines !!! – RolandoMySQLDBA May 20 '11 at 13:07

11 Answers11

25

After looking over the other answers, it helped me realize an underlying problem.

The articles table and article_images table both need to have an additional index.

This is what you have now:

CREATE TABLE `articles` (
    `id` int(10) unsigned not null auto_increment,
    `author_id` int(10) unsigned not null,
    `date_created` datetime not null,
    PRIMARY KEY(id)
) ENGINE=InnoDB;

CREATE TABLE `article_images` (
    `article_id` int(10) unsigned not null,
    `filename` varchar(100) not null,
    `date_added` datetime not null,
    UNIQUE INDEX(article_id, filename)
) ENGINE=InnoDB;

This is what you need:

CREATE TABLE `articles` (
    `id` int(10) unsigned not null auto_increment,
    `author_id` int(10) unsigned not null,
    `date_created` datetime not null,
    PRIMARY KEY(id),
    INDEX (date_created DESC)
) ENGINE=InnoDB;

CREATE TABLE `article_images` (
    `article_id` int(10) unsigned not null,
    `filename` varchar(100) not null,
    `date_added` datetime not null,
    UNIQUE INDEX (article_id, filename),
    INDEX (article_id, date_added),
) ENGINE=InnoDB;

articles
The new index for ordering the articles by insertion date in descending order

article_images
The first index is still needed since it will guard against attaching the another image with the same filename to an article.
The second index will make finding the most recent image simple,

With those new indexes in place, here is the query that will do your bidding:

  SELECT
    AAA.author_id,
    AAA.date_created,
    IFNULL(BBB.title,'<NO_TITLE>') title,
    IFNULL(CCC.filename,'<NO-IMAGE>') filename
  FROM
  (
    SELECT
      AA.id,
      AA.date_added,
      BB.author_id,
      BB.date_created
    FROM
    (
      SELECT
        A.id,IFNULL(MAX(B.date_added),'1900-01-01 00:00:00') date_added
        FROM (SELECT id FROM articles ORDER BY date_created DESC LIMIT 40) A
        LEFT JOIN article_images B ON A.id = B.article_id
        GROUP BY A.id
    ) AA
    INNER JOIN articles BB USING (id)
  ) AAA
  LEFT JOIN article_contents BBB ON AAA.id=BBB.article_id
  LEFT JOIN article_images CCC
  ON (AAA.id=CCC.article_id AND AAA.date_added=CCC.date_added)
  ORDER BY AAA.date_created DESC;

The goal is first to create an internal subquery that only has 40 rows. That is what AA should bring back. Subquery AAA should have the article info plus the date the last image was inserted. The last set of joins should connect the 40 ids with a title and an image. The final step is to present the result set in descensding order.

Since I do not have sample data, I wrote three stored procedures in a sample database called stuff

1 to make the tables

DELIMITER $$

DROP PROCEDURE IF EXISTS `stuff`.`MakeTables` $$
CREATE PROCEDURE `stuff`.`MakeTables` ()
BEGIN

DROP TABLE IF EXISTS articles;
DROP TABLE IF EXISTS article_contents;
DROP TABLE IF EXISTS article_images;

CREATE TABLE `articles` (
    `id` int(10) unsigned not null auto_increment,
    `author_id` int(10) unsigned not null,
    `date_created` datetime not null,
    PRIMARY KEY(id),
    INDEX (date_created DESC)
) ENGINE=InnoDB;

CREATE TABLE `article_contents` (
    `article_id` int(10) unsigned not null,
    `title` varchar(100) not null,
    `content` text not null,
PRIMARY KEY(article_id)) ENGINE=InnoDB;

CREATE TABLE `article_images` (
    `article_id` int(10) unsigned not null,
    `filename` varchar(100) not null,
    `date_added` datetime not null,
    UNIQUE INDEX (article_id, filename),
    INDEX (article_id, date_added)
) ENGINE=InnoDB;

END $$

DELIMITER ;

1 to make the data

DELIMITER $$

DROP PROCEDURE IF EXISTS `stuff`.`LoadSampleData` $$
CREATE PROCEDURE `stuff`.`LoadSampleData` ()
BEGIN

    DECLARE x,y,z INT;

    SET x = 1;
    WHILE x <= 100 DO
        INSERT INTO articles (author_id,date_created) VALUES
        (RAND() * POWER(2,31),
        DATE('1970-01-01 00:00:00') + INTERVAL (RAND() * POWER(2,30)) SECOND);
        SET x = x + 1;
    END WHILE;

    SET x = 1;
    WHILE x <= 100 DO
        SET y = FLOOR(RAND() * 100);
        IF y >= 30 THEN
            INSERT INTO article_contents
            VALUES (x,
            CONCAT('TITLE_',FLOOR(RAND() * POWER(2,31))),
            CONCAT('CONTENT_',FLOOR(RAND() * POWER(2,31))));
        END IF;
        SET x = x + 1;
    END WHILE;

    SET x = 1;
    WHILE x <= 100 DO
        SELECT COUNT(1) INTO y FROM article_contents WHERE article_id = x;
        IF y = 1 THEN
            SET y = 0;
            WHILE y < 20 DO
                SET y = y + 1;
                SET z = FLOOR(RAND() * 10);
                IF z >= 5 THEN
                    INSERT INTO article_images
                    VALUES (x,
                    CONCAT('IMAGE_',FLOOR(RAND() * POWER(2,31))),
                    DATE('1970-01-01 00:00:00') + INTERVAL (RAND() * POWER(2,30)) SECOND);
                END IF;
            END WHILE;
        END IF;
        SET x = x + 1;
    END WHILE;

    SELECT COUNT(1) INTO x FROM articles;
    SELECT COUNT(1) INTO y FROM article_contents;
    SELECT COUNT(1) INTO z FROM article_images;

    SELECT CONCAT('Articles:',x,' Titles:',y,' Images: ',z) Results;

END $$

DELIMITER ;

1 to run the query

DELIMITER $$

DROP PROCEDURE IF EXISTS `stuff`.`ShowLast40` $$
CREATE PROCEDURE `stuff`.`ShowLast40` ()
BEGIN

  SELECT
    AAA.author_id,
    AAA.date_created,
    IFNULL(BBB.title,'<NO_TITLE>') title,
    IFNULL(CCC.filename,'<NO-IMAGE>') filename
  FROM
  (
    SELECT
      AA.id,
      AA.date_added,
      BB.author_id,
      BB.date_created
    FROM
    (
      SELECT
        A.id,IFNULL(MAX(B.date_added),'1900-01-01 00:00:00') date_added
        FROM (SELECT id FROM articles ORDER BY date_created DESC LIMIT 40) A
        LEFT JOIN article_images B ON A.id = B.article_id
        GROUP BY A.id
    ) AA
    INNER JOIN articles BB USING (id)
  ) AAA
  LEFT JOIN article_contents BBB ON AAA.id=BBB.article_id
  LEFT JOIN article_images CCC
  ON (AAA.id=CCC.article_id AND AAA.date_added=CCC.date_added)
  ORDER BY AAA.date_created DESC;

END $$

DELIMITER ;

Here is an example output when I ran it on my Windows Desktop (MySQL 5.5.12):

mysql> call maketables; call loadsampledata;
Query OK, 0 rows affected (0.31 sec)

+------------------------------------+
| Results                            |
+------------------------------------+
| Articles:100 Titles:67 Images: 666 |
+------------------------------------+
1 row in set (23.73 sec)

mysql> call showlast40;
+------------+---------------------+------------------+------------------+
| author_id  | date_created        | title            | filename         |
+------------+---------------------+------------------+------------------+
| 1576679519 | 2004-01-02 14:05:17 | TITLE_1757853515 | IMAGE_1601858163 |
|  992840519 | 2003-05-17 05:48:11 | TITLE_319026350  | IMAGE_1913708631 |
| 1366977197 | 2003-03-20 19:37:23 | <NO_TITLE>       | <NO-IMAGE>       |
| 1705517382 | 2003-02-07 16:48:56 | <NO_TITLE>       | <NO-IMAGE>       |
| 1529899841 | 2002-11-02 20:59:26 | <NO_TITLE>       | <NO-IMAGE>       |
| 1756824942 | 2002-08-28 16:01:59 | <NO_TITLE>       | <NO-IMAGE>       |
|  175825630 | 2002-05-08 13:48:56 | TITLE_240812804  | IMAGE_1214850809 |
|  757530551 | 2002-02-08 00:20:17 | TITLE_4447486    | IMAGE_1511850161 |
|  840251261 | 2002-01-25 20:06:56 | TITLE_1160842143 | IMAGE_206027488  |
|  964653347 | 2001-12-26 19:15:47 | TITLE_1552408257 | IMAGE_717719932  |
| 2106039126 | 2001-11-11 17:09:29 | <NO_TITLE>       | <NO-IMAGE>       |
| 2085991608 | 2001-08-01 12:48:20 | <NO_TITLE>       | <NO-IMAGE>       |
| 1895462094 | 2000-12-02 05:31:41 | <NO_TITLE>       | <NO-IMAGE>       |
| 1589384729 | 2000-04-28 23:55:50 | TITLE_1040850308 | IMAGE_1200414639 |
|  514341550 | 2000-04-20 07:25:05 | TITLE_188288840  | IMAGE_164856430  |
|  887359583 | 2000-02-13 03:30:47 | <NO_TITLE>       | <NO-IMAGE>       |
| 1156687499 | 1999-06-16 00:29:17 | TITLE_686398996  | IMAGE_670200418  |
| 1561242593 | 1998-12-08 05:50:17 | <NO_TITLE>       | <NO-IMAGE>       |
| 1117889993 | 1998-10-23 17:02:44 | TITLE_1491217221 | IMAGE_649630126  |
|  740063135 | 1998-09-16 23:52:41 | TITLE_579374776  | IMAGE_757313192  |
|  429699232 | 1998-04-19 01:41:17 | TITLE_73748980   | IMAGE_1881818111 |
| 1827051060 | 1998-02-27 01:01:50 | TITLE_1588619991 | IMAGE_1657322715 |
| 1442984429 | 1997-11-19 21:23:35 | TITLE_184173382  | IMAGE_597809368  |
|  152267158 | 1997-05-02 20:25:50 | <NO_TITLE>       | <NO-IMAGE>       |
| 1323598169 | 1997-03-14 16:30:38 | TITLE_1355869397 | IMAGE_1058313818 |
|   66122740 | 1997-01-05 15:12:20 | TITLE_1259073183 | IMAGE_198280936  |
|    5161474 | 1996-06-28 10:47:26 | TITLE_1876022823 | IMAGE_1138098675 |
| 1865082792 | 1996-03-01 19:09:11 | TITLE_1288151615 | IMAGE_245974646  |
| 1923481146 | 1995-08-07 00:36:11 | TITLE_922744000  | IMAGE_2067090321 |
| 1725218958 | 1995-03-18 05:15:29 | TITLE_583120586  | IMAGE_592773824  |
|  117806248 | 1995-01-05 02:34:32 | <NO_TITLE>       | <NO-IMAGE>       |
| 1428777335 | 1993-06-06 01:52:32 | TITLE_661148588  | IMAGE_633345518  |
| 1091245943 | 1993-06-05 05:51:47 | TITLE_1407444563 | IMAGE_538936256  |
| 2088382260 | 1993-03-25 06:03:29 | TITLE_1144364681 | IMAGE_1790013089 |
|  625878569 | 1992-12-21 07:41:26 | TITLE_1319355723 | IMAGE_921580624  |
|  110555110 | 1992-01-01 20:49:59 | <NO_TITLE>       | <NO-IMAGE>       |
| 1110532475 | 1991-11-20 07:19:32 | <NO_TITLE>       | <NO-IMAGE>       |
| 1726795146 | 1990-10-09 00:23:44 | TITLE_782624350  | IMAGE_1760322575 |
|  370183888 | 1990-03-30 15:59:17 | <NO_TITLE>       | <NO-IMAGE>       |
| 1497483317 | 1990-02-19 01:25:41 | TITLE_776483948  | IMAGE_665824222  |
+------------+---------------------+------------------+------------------+
40 rows in set (0.00 sec)

Give it a Try !!!

UPDATE

I made absolutely sure that the 40 datetimes you are reading are in fact the top 40. They are. I ran the query : select * from articles ORDER BY date_created DESC; to make sure.

UPDATE 2011-05-17 14:06

mysql> call maketables; call loadsampledata;
Query OK, 0 rows affected (0.45 sec)

+-------------------------------------+
| Results                             |
+-------------------------------------+
| Articles:100 Titles:67 Images: 6739 |
+-------------------------------------+
1 row in set (3 min 45.45 sec)

Query OK, 0 rows affected (3 min 45.45 sec)

mysql> call showlast40;
+------------+---------------------+------------------+------------------+
| author_id  | date_created        | title            | filename         |
+------------+---------------------+------------------+------------------+
|  196582776 | 2004-01-05 14:09:04 | <NO_TITLE>       | <NO-IMAGE>       |
| 1880371016 | 2003-07-31 05:50:37 | TITLE_1191518827 | IMAGE_1562208019 |
|   22204986 | 2003-02-16 14:09:22 | <NO_TITLE>       | <NO-IMAGE>       |
|  355490160 | 2002-11-21 02:35:19 | <NO_TITLE>       | <NO-IMAGE>       |
|  869510149 | 2001-12-27 22:07:52 | TITLE_1528616779 | IMAGE_223327284  |
| 2063556512 | 2001-04-16 18:47:46 | TITLE_1839975091 | IMAGE_1282187005 |
|  529754190 | 2000-07-14 19:44:01 | TITLE_1557423205 | IMAGE_1931606737 |
|  166226262 | 1999-11-08 03:27:22 | <NO_TITLE>       | <NO-IMAGE>       |
| 1981417562 | 1999-09-11 12:59:10 | TITLE_198262896  | IMAGE_1491273871 |
|  831057001 | 1999-06-14 15:06:31 | TITLE_1170272131 | IMAGE_760396200  |
| 1454252623 | 1998-06-02 08:35:46 | <NO_TITLE>       | <NO-IMAGE>       |
| 1435450777 | 1997-11-17 18:10:34 | TITLE_482497458  | IMAGE_1331932705 |
| 1536315541 | 1997-11-02 05:24:49 | <NO_TITLE>       | <NO-IMAGE>       |
| 2078028530 | 1997-03-14 22:36:58 | TITLE_321332010  | IMAGE_1897983295 |
|  701651581 | 1997-01-13 22:36:58 | TITLE_1337390701 | IMAGE_1630983859 |
|  101442444 | 1996-11-22 09:40:16 | <NO_TITLE>       | <NO-IMAGE>       |
|   51114930 | 1996-11-20 03:24:49 | TITLE_1866751135 | IMAGE_1669595407 |
|  722056183 | 1996-08-03 15:23:01 | <NO_TITLE>       | <NO-IMAGE>       |
| 1178720989 | 1996-06-29 22:47:19 | TITLE_579734376  | IMAGE_833229222  |
|  511355958 | 1996-03-10 09:32:46 | TITLE_1540275289 | IMAGE_1168117261 |
|  831921829 | 1996-01-31 06:36:04 | TITLE_661038882  | IMAGE_1199197195 |
| 1288455163 | 1995-08-22 00:34:25 | TITLE_1599332515 | IMAGE_822445764  |
| 1976208956 | 1995-06-02 09:23:01 | TITLE_58372998   | IMAGE_793318650  |
| 2092066982 | 1995-03-28 20:35:37 | TITLE_1693142377 | IMAGE_1176935479 |
| 1083841549 | 1994-07-11 18:46:52 | TITLE_1955674591 | IMAGE_1240485919 |
|  359037132 | 1994-07-11 02:44:19 | TITLE_713454936  | IMAGE_1072569732 |
| 1471985773 | 1994-05-10 17:08:01 | TITLE_1065017724 | IMAGE_393097704  |
| 1888864730 | 1994-01-15 17:41:28 | TITLE_1060275498 | IMAGE_230810100  |
| 1688028488 | 1993-12-28 06:36:58 | <NO_TITLE>       | <NO-IMAGE>       |
| 1739777948 | 1993-02-15 00:30:31 | TITLE_1226842225 | IMAGE_1615058467 |
|  445721334 | 1991-12-15 20:54:49 | TITLE_1336145587 | IMAGE_2114729323 |
| 1661002442 | 1991-06-30 05:49:34 | TITLE_151142910  | IMAGE_1623325381 |
| 2092223006 | 1991-06-13 13:15:58 | TITLE_33175860   | IMAGE_1225117771 |
| 1553434585 | 1991-01-12 03:34:25 | TITLE_728483442  | IMAGE_1954153339 |
|  528544608 | 1990-11-10 08:21:04 | <NO_TITLE>       | <NO-IMAGE>       |
| 1043927395 | 1990-10-05 00:48:49 | TITLE_304307448  | IMAGE_1702062493 |
| 1685702960 | 1990-04-28 05:44:19 | TITLE_1909853341 | IMAGE_263553036  |
| 1392428383 | 1990-03-07 15:08:46 | <NO_TITLE>       | <NO-IMAGE>       |
|  643714153 | 1990-02-14 08:32:10 | TITLE_837416724  | IMAGE_1673964259 |
| 2132028206 | 1989-09-28 16:04:07 | TITLE_614908878  | IMAGE_1362210487 |
+------------+---------------------+------------------+------------------+
40 rows in set (0.01 sec)

Everything works as I published before even with a larger set of images. Here is the query from the ShowLast40 Stored Procedure:

  SELECT
    AAA.author_id,
    AAA.date_created,
    IFNULL(BBB.title,'<NO_TITLE>') title,
    IFNULL(CCC.filename,'<NO-IMAGE>') filename
  FROM
  (
    SELECT
      AA.id,
      AA.date_added,
      BB.author_id,
      BB.date_created
    FROM
    (
      SELECT
        A.id,IFNULL(MAX(B.date_added),'1900-01-01 00:00:00') date_added
        FROM (SELECT id FROM articles ORDER BY date_created DESC LIMIT 40) A
        LEFT JOIN article_images B ON A.id = B.article_id
        GROUP BY A.id
    ) AA
    INNER JOIN articles BB USING (id)
  ) AAA
  LEFT JOIN article_contents BBB ON AAA.id=BBB.article_id
  LEFT JOIN article_images CCC
  ON (AAA.id=CCC.article_id AND AAA.date_added=CCC.date_added)
  ORDER BY AAA.date_created DESC;

UPDATE 2011-05-17 15:16

Just for laughs, I increased the sample and reran it. Same query speed.

mysql> call maketables; call loadsampledata; call showlast40;
Query OK, 0 rows affected (0.38 sec)

+----------------------------------------+
| Results                                |
+----------------------------------------+
| Articles:1000 Titles:886 Images: 88596 |
+----------------------------------------+
1 row in set (51 min 22.29 sec)

Query OK, 0 rows affected (51 min 22.29 sec)

+------------+---------------------+------------------+------------------+
| author_id  | date_created        | title            | filename         |
+------------+---------------------+------------------+------------------+
| 1464539515 | 2004-01-07 22:45:04 | TITLE_676010724  | IMAGE_1877060293 |
|  272558724 | 2004-01-03 23:44:19 | TITLE_1213504045 | IMAGE_550812606  |
| 2000476448 | 2003-12-28 18:05:10 | TITLE_1762951489 | IMAGE_1201290847 |
|  955209697 | 2003-12-25 00:55:43 | TITLE_1064749344 | IMAGE_1335865507 |
| 1657429856 | 2003-12-19 01:03:13 | TITLE_1931852743 | IMAGE_905288424  |
|  759381001 | 2003-11-12 10:46:52 | TITLE_878255772  | IMAGE_2014780795 |
| 1269478951 | 2003-11-06 02:06:22 | TITLE_2026098781 | IMAGE_982272966  |
| 1049672131 | 2003-10-04 20:55:34 | TITLE_2043080215 | IMAGE_987859662  |
| 1429108729 | 2003-09-16 19:07:52 | TITLE_424483080  | IMAGE_35379150   |
| 1672198676 | 2003-09-13 11:49:52 | TITLE_1131552745 | IMAGE_875049630  |
| 1645878842 | 2003-08-24 13:42:04 | TITLE_1077302833 | IMAGE_702269538  |
|  172347180 | 2003-08-21 14:26:37 | TITLE_558691044  | IMAGE_1091183587 |
| 1137674509 | 2003-08-15 08:44:37 | TITLE_1982979709 | IMAGE_1234487941 |
|  282998112 | 2003-08-05 10:01:34 | TITLE_353831568  | IMAGE_738487608  |
|  246145344 | 2003-08-02 00:42:31 | TITLE_376954044  | IMAGE_1279375459 |
|  218409162 | 2003-07-14 02:55:16 | TITLE_1932540991 | IMAGE_1078689211 |
|  593263087 | 2003-07-12 22:47:01 | TITLE_1604012533 | IMAGE_834822870  |
| 2115914174 | 2003-07-06 03:06:31 | TITLE_1268165545 | IMAGE_1068632322 |
|  552557275 | 2003-07-01 16:45:22 | TITLE_2022112717 | IMAGE_1410588295 |
| 1500437041 | 2003-06-29 20:05:19 | TITLE_35559258   | IMAGE_159953586  |
| 1098371257 | 2003-06-09 07:29:37 | TITLE_1694076415 | IMAGE_1409619391 |
| 1570373503 | 2003-05-22 16:45:04 | TITLE_125157894  | IMAGE_723393492  |
| 1330507411 | 2003-05-05 21:40:07 | TITLE_1571250589 | IMAGE_701840418  |
| 1666035620 | 2003-04-26 02:51:40 | TITLE_504713706  | IMAGE_1410357553 |
| 1458179791 | 2003-04-19 05:34:25 | <NO_TITLE>       | <NO-IMAGE>       |
| 1365758305 | 2003-03-28 10:09:58 | TITLE_704554170  | IMAGE_2085080137 |
| 2131082774 | 2003-03-26 16:43:25 | TITLE_1411034929 | IMAGE_303539208  |
|  103396632 | 2003-02-14 09:11:28 | TITLE_915927396  | IMAGE_1381045723 |
|  396479202 | 2003-02-01 15:51:40 | <NO_TITLE>       | <NO-IMAGE>       |
| 2019916250 | 2003-01-30 00:44:46 | <NO_TITLE>       | <NO-IMAGE>       |
|  431091906 | 2003-01-29 13:08:37 | <NO_TITLE>       | <NO-IMAGE>       |
|  705166549 | 2003-01-23 21:37:07 | TITLE_1530318643 | IMAGE_257673696  |
| 1278327049 | 2002-12-31 16:39:40 | TITLE_772845324  | IMAGE_1355754913 |
| 1871174528 | 2002-12-30 19:03:40 | TITLE_65725764   | IMAGE_522904938  |
|  611892727 | 2002-12-22 10:19:07 | TITLE_333758274  | IMAGE_734815032  |
|  758497849 | 2002-12-04 15:05:10 | TITLE_129140574  | IMAGE_244407066  |
|  518111034 | 2002-10-17 16:38:10 | TITLE_1976498683 | IMAGE_2008599775 |
| 1737725786 | 2002-10-15 23:52:52 | TITLE_1441053871 | IMAGE_1595265847 |
| 1206055789 | 2002-10-03 22:07:52 | TITLE_562697952  | IMAGE_198940092  |
|  702790153 | 2002-08-31 17:37:16 | TITLE_1788304903 | IMAGE_1212944101 |
+------------+---------------------+------------------+------------------+
40 rows in set (0.01 sec)

Query OK, 0 rows affected (0.01 sec)

UPDATE 2011-05-17 15:40

Here is the new output including the date of the latest image:

mysql> call showlast40;
+------------+---------------------+------------------+------------------+---------------------+
| author_id  | date_created        | title            | filename         | image_date          |
+------------+---------------------+------------------+------------------+---------------------+
| 1464539515 | 2004-01-07 22:45:04 | TITLE_676010724  | IMAGE_1877060293 | 2003-10-16 02:06:58 |
|  272558724 | 2004-01-03 23:44:19 | TITLE_1213504045 | IMAGE_550812606  | 2003-12-28 07:25:43 |
| 2000476448 | 2003-12-28 18:05:10 | TITLE_1762951489 | IMAGE_1201290847 | 2003-08-31 16:56:01 |
|  955209697 | 2003-12-25 00:55:43 | TITLE_1064749344 | IMAGE_1335865507 | 2003-11-11 18:37:07 |
| 1657429856 | 2003-12-19 01:03:13 | TITLE_1931852743 | IMAGE_905288424  | 2003-09-26 07:20:01 |
|  759381001 | 2003-11-12 10:46:52 | TITLE_878255772  | IMAGE_2014780795 | 2003-09-30 06:54:40 |
| 1269478951 | 2003-11-06 02:06:22 | TITLE_2026098781 | IMAGE_982272966  | 2003-07-28 11:59:10 |
| 1049672131 | 2003-10-04 20:55:34 | TITLE_2043080215 | IMAGE_987859662  | 2003-11-19 05:26:37 |
| 1429108729 | 2003-09-16 19:07:52 | TITLE_424483080  | IMAGE_35379150   | 2003-11-18 22:17:55 |
| 1672198676 | 2003-09-13 11:49:52 | TITLE_1131552745 | IMAGE_875049630  | 2003-06-08 00:42:58 |
| 1645878842 | 2003-08-24 13:42:04 | TITLE_1077302833 | IMAGE_702269538  | 2002-04-02 01:21:49 |
|  172347180 | 2003-08-21 14:26:37 | TITLE_558691044  | IMAGE_1091183587 | 2003-08-13 08:30:22 |
| 1137674509 | 2003-08-15 08:44:37 | TITLE_1982979709 | IMAGE_1234487941 | 2003-12-17 11:53:28 |
|  282998112 | 2003-08-05 10:01:34 | TITLE_353831568  | IMAGE_738487608  | 2003-11-08 22:03:22 |
|  246145344 | 2003-08-02 00:42:31 | TITLE_376954044  | IMAGE_1279375459 | 2003-12-05 02:30:49 |
|  218409162 | 2003-07-14 02:55:16 | TITLE_1932540991 | IMAGE_1078689211 | 2003-07-14 15:59:37 |
|  593263087 | 2003-07-12 22:47:01 | TITLE_1604012533 | IMAGE_834822870  | 2003-09-02 05:48:22 |
| 2115914174 | 2003-07-06 03:06:31 | TITLE_1268165545 | IMAGE_1068632322 | 2003-04-28 16:29:01 |
|  552557275 | 2003-07-01 16:45:22 | TITLE_2022112717 | IMAGE_1410588295 | 2003-11-01 01:55:16 |
| 1500437041 | 2003-06-29 20:05:19 | TITLE_35559258   | IMAGE_159953586  | 2003-08-02 10:34:07 |
| 1098371257 | 2003-06-09 07:29:37 | TITLE_1694076415 | IMAGE_1409619391 | 2004-01-07 01:00:13 |
| 1570373503 | 2003-05-22 16:45:04 | TITLE_125157894  | IMAGE_723393492  | 2003-09-26 23:22:43 |
| 1330507411 | 2003-05-05 21:40:07 | TITLE_1571250589 | IMAGE_701840418  | 2003-11-19 20:57:31 |
| 1666035620 | 2003-04-26 02:51:40 | TITLE_504713706  | IMAGE_1410357553 | 2003-11-18 01:30:04 |
| 1458179791 | 2003-04-19 05:34:25 | <NO_TITLE>       | <NO-IMAGE>       | <NO-IMAGE-DATE>     |
| 1365758305 | 2003-03-28 10:09:58 | TITLE_704554170  | IMAGE_2085080137 | 2003-11-11 16:35:19 |
| 2131082774 | 2003-03-26 16:43:25 | TITLE_1411034929 | IMAGE_303539208  | 2003-05-14 12:59:37 |
|  103396632 | 2003-02-14 09:11:28 | TITLE_915927396  | IMAGE_1381045723 | 2003-12-28 18:26:28 |
|  396479202 | 2003-02-01 15:51:40 | <NO_TITLE>       | <NO-IMAGE>       | <NO-IMAGE-DATE>     |
| 2019916250 | 2003-01-30 00:44:46 | <NO_TITLE>       | <NO-IMAGE>       | <NO-IMAGE-DATE>     |
|  431091906 | 2003-01-29 13:08:37 | <NO_TITLE>       | <NO-IMAGE>       | <NO-IMAGE-DATE>     |
|  705166549 | 2003-01-23 21:37:07 | TITLE_1530318643 | IMAGE_257673696  | 2003-08-23 19:06:22 |
| 1278327049 | 2002-12-31 16:39:40 | TITLE_772845324  | IMAGE_1355754913 | 2003-12-22 16:40:25 |
| 1871174528 | 2002-12-30 19:03:40 | TITLE_65725764   | IMAGE_522904938  | 2003-09-06 07:08:01 |
|  611892727 | 2002-12-22 10:19:07 | TITLE_333758274  | IMAGE_734815032  | 2003-09-22 19:16:43 |
|  758497849 | 2002-12-04 15:05:10 | TITLE_129140574  | IMAGE_244407066  | 2003-07-15 12:38:37 |
|  518111034 | 2002-10-17 16:38:10 | TITLE_1976498683 | IMAGE_2008599775 | 2004-01-06 16:37:34 |
| 1737725786 | 2002-10-15 23:52:52 | TITLE_1441053871 | IMAGE_1595265847 | 2003-11-24 15:23:10 |
| 1206055789 | 2002-10-03 22:07:52 | TITLE_562697952  | IMAGE_198940092  | 2003-08-23 11:56:46 |
|  702790153 | 2002-08-31 17:37:16 | TITLE_1788304903 | IMAGE_1212944101 | 2003-12-19 22:56:01 |
+------------+---------------------+------------------+------------------+---------------------+
40 rows in set (0.01 sec)

Here is the new query

  SELECT
    AAA.author_id,
    AAA.date_created,
    IFNULL(BBB.title,'<NO_TITLE>') title,
    IFNULL(CCC.filename,'<NO-IMAGE>') filename,
    IFNULL(CCC.date_added,'<NO-IMAGE-DATE>') image_date
  FROM
  (
    SELECT
      AA.id,
      AA.date_added,
      BB.author_id,
      BB.date_created
    FROM
    (
      SELECT
        A.id,IFNULL(MAX(B.date_added),'1900-01-01 00:00:00') date_added
        FROM (SELECT id FROM articles ORDER BY date_created DESC LIMIT 40) A
        LEFT JOIN article_images B ON A.id = B.article_id
        GROUP BY A.id
    ) AA
    INNER JOIN articles BB USING (id)
  ) AAA
  LEFT JOIN article_contents BBB ON AAA.id=BBB.article_id
  LEFT JOIN article_images CCC
  ON (AAA.id=CCC.article_id AND AAA.date_added=CCC.date_added)
  ORDER BY AAA.date_created DESC;

My query does one thing nobody else's query does:

My query gets the last 40 articles first before joining to any other table.

Everybody else's queries joins everything first and then attempts to do LIMIT 40 on a large temp table.

My query's running time is not a factor of the amount of data.

Regardless of the sample size I create, my query has to be the fastest !!!

RolandoMySQLDBA
  • 43,883
  • 16
  • 91
  • 132
  • 1
    I may have overlooked something, but I had to add "GROUP BY AAA.id" to the end of the query. Otherwise I was getting duplicate articles for each image attached to them. Does that seem right? – mellowsoon May 17 '11 at 17:47
  • I created the bigger sample. I will post my findings. – RolandoMySQLDBA May 17 '11 at 18:05
  • I posted my findings. The query should work as is. Make sure you indexed the tables as specified in my original answer. – RolandoMySQLDBA May 17 '11 at 18:31
  • I will change the query to show the date_added field for the article's latest image. BRB !!! – RolandoMySQLDBA May 17 '11 at 19:36
  • Changes posted with new column output and the query. – RolandoMySQLDBA May 17 '11 at 19:45
  • It is impossible for MySQL's optimizer to properly optimize such subqueries. Subquery optimizations will only be available in MySQL 6.0. ( http://forge.mysql.com/wiki/6.0_Subquery_Optimization_Benchmarks ) – grateful.dev May 17 '11 at 20:10
  • That's why the query is organized the way it now. It works and works fast !!! This is why I provided the stored procedure to create the tables, populate them, and retrieve the last 40 articles as @mellowsoon specified. My query cannot be refactored (nor does it need to be) any more than its performance displays. – RolandoMySQLDBA May 17 '11 at 20:33
  • In all fairness, the answer from @Denis was the first to describe the correct path. However, he does the LIMIT 40 last after all JOINs. I do the LIMIT 40 first before any JOINs take place. Since subquery optimization is yet in the future (if Oracle bothers to implement it), I had force the subquery to behave the way I wanted. BTW @Denis, +1 for you describing the correct path first. – RolandoMySQLDBA May 17 '11 at 20:46
  • I tested your query and the actual performance on large data collections seems very good although the execution plan does seem bad. Great work. – grateful.dev May 17 '11 at 21:53
  • @Saggi Thank you kindly. I am not a biggest of the MySQL Query Optimizer because sometimes it may rule out LIMIT clauses in subqueries. In fact, I wrote about that in the DBA StackExchange about 3 months ago (http://dba.stackexchange.com/questions/1371/problem-with-mysql-subquery/1384#1384). The EXPLAIN plan you saw is proof of that. Trying to write and refactor queries in MySQL with subqueries can sometimes be like Russian Roulette. – RolandoMySQLDBA May 17 '11 at 22:05
  • Holy crap this is complete. I actually learned something - awesome Rolando! – Jim Rubenstein May 18 '11 at 12:56
  • @RolandoMySQLDBA - The query is indeed very fast, with results between 0.00 and 0.03. The EXPLAIN on the query shows a lot of "Using temporary" and "Using filesort", which I typically try to avoid. I'm assuming I should just look at the speed of the actual query as a measure of performance, and disregard the EXPLAIN. – mellowsoon May 18 '11 at 17:43
  • @mellowsoon Exactly. To circumvent the deficiencies of the MySQL Query Optimizer, you must refactor parts of a query to make temp tables be as small as possible. The order of the running time is deduced from the EXPLAIN plan and would appear bad in terms of the steps taken. So you essentially have to give the MySQL Query Optimizer less to work with. I learned this from watching this video on YouTube about refactoring queries : http://www.youtube.com/watch?v=ZVisY-fEoMw&playnext=1&list=PL0194B59719B45A96 – RolandoMySQLDBA May 18 '11 at 17:53
  • Thanks for the appreciation and your answer provided with commands is good enough but I want you re-make the command just delete the text column from your commands and what would be the output result time as you have whole database available easily which make you easily to do that and if you can do add text column in last join then the timing. Just, I want to know the timing changes and it would also help me to think any other option available between these. You have also done tremendous good job by doing re-search and keep a quick eye on the post – Vineet1982 May 20 '11 at 14:13
3

This is a case where an inline subquery, rather than a join, will work well:

select articles.*,
       article_contents.title,
       article_contents.content,
       (select article_images.filename
       from article_images
       where article_images.article_id = articles.id
       order by article_images.date_added desc
       limit 1
       ) as image_filename
from articles
join article_contents
on article_contents.article_id = articles.id
order by articles.date_created desc
limit 40;

Performance wise, it will nestloop through the top 40 rows of articles, which is the fastest possible plan; and for each of these rows, the subquery will nestloop to the top applicable row in article_images, which also happens to be the fastest plan for a given article.

If you need to fetch more than a single field from the images table, I take it you've an image_id. Assuming so, grab the image_id instead, and then do a second query with an in clause to retrieve the rows you need.

An alternative (and slightly faster) approach will be to use triggers to keep the latest image_id stored in the articles table. Doing will allow you to left join the images directly.

Denis de Bernardy
  • 75,850
  • 13
  • 131
  • 154
2

Basically you want to do select rows holding group-wise maximum of a field, which was linked by ʞɔıu in MySQL - Control which row is returned by a group by.

In MySQL it is not possible to do an ORDER BY before a GROUP BY. You can do Subselects as described in the first linked answer or as shown in MySQL Group By ordering.

In your case, this will most probably result in bad performance, so it would be possible to use two seperate queries. Not perfectly elegant, but reliable:

SELECT a.id
FROM articles a
ORDER BY a.date_created DESC
LIMIT 40

and for each of the 40 result rows with %ARTICLE_ID% being the current row id:

SELECT ai.filename
FROM article_images ai
WHERE ai.article_id = %ARTICLE_ID%
ORDER BY ai.date_added DESC
LIMIT 1

My advice would be another one: redundancy. Save the latest article image redundantly in your article table. This results in easy and fast queries.

Community
  • 1
  • 1
Alp
  • 29,274
  • 27
  • 120
  • 198
  • Keeping the current image in the articles table is what I've been doing, and I was hoping to drop it. However if that's the answer, then that's the answer. – mellowsoon May 16 '11 at 01:13
  • 1
    I fully read this answer and I understand what you are doing. Your steps could have been combined via subqueries. However, from an application standpoint, your answer can conceivably work, and work fast. From there, it would have to be at the developer's discretion to implement. +1 for your answer. – RolandoMySQLDBA May 24 '11 at 21:04
1

Let me First describe you the reason of time taken by the query to solve this problem. My English is not much good but I will try to explain it best way.

Let's take the example of 2 table join, When we use join table command what is basically done. According to the query, engine scans the first table then it goes to second table and start reading first table and second table and where the query matches, it takes it the memory. Now Let's take the example of 3 tables join, Read 1st table then 2 second and for performing on 3rd sql, if the data is heavy then creates the tmp table and starts reading and then executing the command and if normally then stores in memory.

Now your problem is not the tables joining command but the problem is structure of the 2nd table only. It has the text column and which is minimum use and during searching the engine has to pick that query and perform the read and write function on it and with every write or read the use of memory decreases and the engine become slow and slow and slow.

Solution: There are two solutions for this problem. First solution is to do partition of the second table horizontally and vertically. What will happen with his the engine would make the marking during search and large chunk of data would be written rapidly and the result of timing would increase. But I don't recommend the solution as it would only decease the time little but the time of reading and writing would not reduce. What would I suggest the second Solution.

2nd Solution: Remove the text column from the 2nd table and make the 4th table of author_aritcle_content with 2 column of title and content and join the 4th table in the last. What would happen with this solution the engine first scan 1st table with 10 + 10 bytes and when we move to the second table then scanning would be done in 10 + 10 + 10 + 100 and with 3rd table it would be 10 + 10 + 10+ 100 + 10 + 100 and at this point the engine have to search only 240 bytes and your result are ready and just engine has to add contents and it would be done easily. The time reduction in the query is very large as it have not to take the text column with engine as it would join in the last, only with results which are going to be displayed.

Today, I added 10,000 articles in the table and executed the below query and it has taken 0.2015 secs, Each article has bytes of 65,000 and image name is 10 bytes and the query is :

select 
    a.id,
    a.author_id,
    a.date_created,
    b.title,
    c.filename,
    c.date_added,
    d.content
from
    articles as a,
    article_contents as b,
    article_images as c,
    article_contents as d
where
    a.id = b.article_id and
    b.article_id = c.article_id and
    c.article_id = d.article_id
order by
    a.date_created desc
limit
    0, 40;

I will again post the results with the articles numbers goes to 50,000 and time taken to execute the query.

Vineet1982
  • 7,730
  • 4
  • 32
  • 67
  • I have read your answer and I think the optimizations you are mentioning are great from a diskspace standpoint and in terms of making base tables as small as possible. However, @mellowsoon wanted just 40 rows from the join of three tables and do it as fast as possible. In my solution, I did 3 things to solve the original question: 1) I added an additional index to two tables, 2) I simply forced the very first temp table to have exactly 40 rows from the articles table, and 3) delayed joining tables until the last. In fact, my answer has O(log n) running time as data load increases. – RolandoMySQLDBA May 20 '11 at 13:00
  • 1
    a +1 for your answer. In other submitted answers, people just threw queries out there and hoped it was right. Developers should think more about the query execution plan, table design, and memory considerations like you just did in your answer !!! – RolandoMySQLDBA May 20 '11 at 13:11
  • Thanks for the appreciation and your answer provided with commands is good enough but I want you re-make the command just delete the text column from your commands and what would be the output result time as you have whole database available easily which make you easily to do that and if you can do add text column in last join then the timing. Just, I want to know the timing changes and it would also help me to think any other option available between these. You have also done tremendous good job by doing re-search and keep a quick eye on the post. – Vineet1982 May 20 '11 at 14:11
  • 1
    I was just thinking about your answer again because a client at my employer's web hosting company runs 5-10 queries on a MySQL master in parallel but it serializes to the slave via Replication. When all is said and done, the slave is easily 40000 seconds behind, almost a half day. Reducing the table sizes, or least the temp table sizes, would help this client immensely and reduce his memory consumption. I just have to convince the developers there to implement this idea as well. Thank you again for thinking outside the box. – RolandoMySQLDBA May 21 '11 at 00:48
  • @RolandoMySQLDBA if you can, execute the above query and provide the results of above query time taken. The query is without any JOIN condition. – Vineet1982 May 21 '11 at 11:44
0

Would something like this work?

select * from articles left join (select article_id, filename from article_images group by article_id order by date_added desc) on id = article_id
Christopher Armstrong
  • 7,907
  • 2
  • 26
  • 28
  • Doesn't look like this solution is going to work. The performance was horrible even after trying a number of different indexing schemes. – mellowsoon May 12 '11 at 20:40
0
SELECT a.id, ai.filename
    FROM articles a
        LEFT JOIN (SELECT article_id, MAX(date_added) AS MaxDate
                       FROM article_images 
                       GROUP BY article_id) maxai
                INNER JOIN article_images ai
                    ON maxai.article_id = ai.article_id
                        AND maxai.MaxDate = ai.date_added
            ON a.id = maxai.article_id
    ORDER BY a.date_created DESC
    LIMIT 40
Joe Stefanelli
  • 132,803
  • 19
  • 237
  • 235
  • As with carmstrong's solution, the performance on this query is pretty bad, which I have to take into consideration. – mellowsoon May 12 '11 at 20:57
0

If performance is major concern when searching and displaying results then I propose you alter your table structure so that the last image ID is stored in the article contents table. That way retrieval speed is optimal.

Sacrifice insert speed for a far improved search function.

GordyD
  • 5,063
  • 25
  • 29
0

This one should be pretty efficient as it does not use any derived tables.

SELECT articles.id, articles.author_id, articles.date_created,
            SUBSTRING_INDEX(GROUP_CONCAT(filename ORDER BY article_images.date_added DESC SEPARATOR "\t"),"\t",1) as latestFilename
    FROM articles
        JOIN article_images
            ON articles.id = article_images.article_id
    GROUP BY articles.id, articles.author_id, articles.date_created
    ORDER BY articles.date_created DESC
    LIMIT 40;

The GROUP_CONCAT returns a string list of all filenames in the group, ordered by article_images_date_added DESC and separated by a "\t".
The SUBSTRING_INDEX will cut the string before the first "\t" so only the latest filename will be returned.

(Obviously, you should also have the relevant indexes added on article_images.article_id, article_images.date_created and articles.date_created.)

grateful.dev
  • 1,437
  • 10
  • 21
0

Try something like this

SELECT  A.author_id ,
        Images.article_id ,
        Images.filename ,
        Images.date_added 
FROM    articles AS A
        LEFT JOIN ( SELECT  AI1.article_id ,
                            AI1.filename ,
                            AI1.date_added
                    FROM    article_images AS AI1
                    WHERE   AI1.date_added = ( SELECT   MAX(date_added)
                                               FROM     article_images AS AI2
                                               WHERE    AI2.article_id = AI1.article_id
                                             )
                  ) AS Images ON A.id = Images.article_id 

You will also need to add an index to the article_images table

(article_id ASC, date_added ASC)
John Petrak
  • 2,898
  • 20
  • 31
0
SELECT a.id,ac.title,ac.content,ai.filename

FROM articles a,article_contents ac,article_images ai

WHERE a.id=ac.article_id and a.id=ai.article_id

ORDER BY a.date_created desc

LIMIT 40 

use above query.it will works for you

Hitesh
  • 815
  • 1
  • 6
  • 11
0

Try this if you are sure that each article has an image

SELECT a.author_id,a.date_created,i.filename FROM test.articles a
left join article_images i on i.article_id=a.id
where i.date_added =(select Max(date_added) from article_images i2);

Else if is not necessary to have an image related to each article (which is more logical) this should work:

SELECT a.author_id,a.date_created,i.filename FROM test.articles a
left join article_images i on (i.article_id=a.id and i.date_added =(select Max(date_added) from article_images i2))
Luci
  • 3,174
  • 7
  • 31
  • 36