2

I am trying to build a page where a list of messages, which a member has received, is displayed. I came up with this mySQL query but it runs extremely slow. It takes at least 10 seconds to execute it.

SELECT senderid,
       receiverid
FROM   messages
WHERE  ( receiverid, sentdate ) IN (SELECT receiverid,
                                           Max(sentdate)
                                    FROM   messages
                                    WHERE  receiverid = '1'
                                    GROUP  BY senderid)
ORDER  BY sentdate DESC 

This is the database I'm using:

`autoID` mediumint(11) unsigned NOT NULL AUTO_INCREMENT
`senderID` mediumint(11) unsigned DEFAULT '0'
`receiverID` mediumint(11) unsigned DEFAULT '0'
`sentDate` datetime DEFAULT '0000-00-00 00:00:00'
`message` longtext
PRIMARY KEY (`autoID`)
KEY `receiverID` (`receiverID`)
KEY `senderID` (`senderID`)

The database has only 150,000 entries. I'm running my own dedicated server with only that mySQL database on it.

Any help is greatly appreciated.

G-Nugget, I've added an index on sentDate but no speed increase. Here's the EXPLAIN (sorry the way is shown. not sure how else I can do it):

id:1
select_type:PRIMARY
table:messages
type:ALL
possible_keys:null
keys:null
key_len:null
ref:null
rows:149433
Extra:Using where; Using filesort

id:2
select_type:DEPENDENT SUBQUERY
table:messages
type:ref
possible_keys:receiverID
key:receiverID
key_len:4
ref:const
rows:20
Extra:Using where; Using temporary; Using filesort

  • 1
    possible duplicate of [Fetch the row which has the Max value for a column](http://stackoverflow.com/questions/121387/fetch-the-row-which-has-the-max-value-for-a-column) – Bill Karwin Nov 27 '12 at 01:23
  • Please post an `EXPLAIN` of the query. It looks like you could benefit from an index on `sentdate`. – G-Nugget Nov 27 '12 at 01:51
  • 1
    Hey, folks, this is a mySQL question. People voting to close it are claiming it's a duplicate with an Oracle question. It isn't! – O. Jones Nov 27 '12 at 01:52
  • @OllieJones People don't read, and closing stuff is fun! – Kermit Nov 27 '12 at 01:58
  • 1
    I'm almost positive that that query doesn't do what you actually intend, in which case, I think you should correct it (and simplify it) before you set about optimizing it. – ruakh Nov 27 '12 at 02:00
  • This seems to display a unique list of senderids for receiverid = 1, in order of the most recently send message from every distinct senderid. But, boy oh boy, does it do it the hard way! I agree with ruakh that it would be helpful to think through exactly what result set you want, and refactor / rewrite this query before optimizing it. By the way, 10 sec isn't horrible timing for something that summarizes 150K rows. – O. Jones Nov 27 '12 at 02:06
  • I really like the way you presented your EXPLAIN. – O. Jones Nov 27 '12 at 16:13

2 Answers2

1

How about this query to yield the result set you mention?

SELECT m.senderid,
       m.receiverid
  FROM messages m
  JOIN (
            SELECT  max(autoID) autoID, 
                    receiverID, SenderID
              FROM  messages
          GROUP BY  receiverId, SenderID
       ) X on m.autoID = x.AutoId
  WHERE m.receiverId = '1'
  ORDER BY m.autoID desc

This takes advantage of the assumption that autoID and sentdate are most likely both monotonically increasing as time goes by. It pulls the ID for the most recent message between each distinct pair of sender/receiver, then uses those IDs to pick a subset of the messages table to display.

O. Jones
  • 103,626
  • 17
  • 118
  • 172
  • I was thinking the same as you... HOWEVER, add the WHERE clause to the INNER select too. Only gather messages for ReceiverID = '1' vs ALL possible message receivers. No order by either as it will return one record for the receiver... the last one. – DRapp Nov 27 '12 at 02:48
  • Ollie Jones your query resulted in loading the page in < 1 second! Absolutely awesome. And following @DRapp advice made the page even faster. You guys rock!! Thank you so much!!!! – Jakob Herrmann Nov 27 '12 at 03:06
0

In MySQL, subqueries with in do not optimize correctly. Your query is a bit complicated. I think the is an efficient way to write the query:

SELECT senderid,
       receiverid
FROM   messages m
WHERE  m.receiverid = '1' and
       exists (SELECT 1
               FROM   messages m1
               WHERE  m1.receiverid = '1'
               GROUP  BY m1.senderid
               having max(sentdate) =  m.sentdate)
ORDER  BY sentdate DESC 

It seems like your query is trying to get the most recent message from each send to '1'. A simpler version might be:

select senderid, receiverid
from messages m
where m.receiverid = '1'

That is, each senderid is going to be included in your original query, because their most recent sentdate will match the in condition. You could have two senders with exactly the same date and time. Is there a need to show such duplicates?

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
  • Thanks Gordon for your answer. Since Ollie's answer worked so well I just went with it. I do greatly appreciate your help though. – Jakob Herrmann Nov 27 '12 at 03:06
  • @JakobHerrmann . . . His answer is probably better for your problem. When I started answering, I didn't realize there was a group by in the subquery. However, does the second query also answer your problem or am I missing something? – Gordon Linoff Nov 27 '12 at 03:34