23

I have searched the site for assistance but still struggling. Here is my table:

messages
========
id
thread_id
user_id
subject
body
date_sent

Basically I want to retrieve the latest record for each thread_id. I have tried the following:

SELECT id, thread_id, user_id, subject, body, date_sent
FROM messages
WHERE user_id=1 AND date_sent=(select max(date_sent))
GROUP BY thread_id
ORDER BY date_sent DESC

BUT it is giving me the oldest records, not the newest!

Anyone able to advise?

EDIT: Table dump:

--
-- Table structure for table `messages`
--

CREATE TABLE IF NOT EXISTS `messages` (
  `id` int(10) unsigned NOT NULL auto_increment,
  `thread_id` int(10) unsigned NOT NULL,
  `user_id` int(10) unsigned NOT NULL,
  `body` text NOT NULL,
  `date_sent` datetime NOT NULL,
  PRIMARY KEY  (`id`)
) ENGINE=InnoDB  DEFAULT CHARSET=latin1 AUTO_INCREMENT=34 ;

--
-- Dumping data for table `messages`
--

INSERT INTO `messages` (`id`, `thread_id`, `user_id`, `body`, `date_sent`) VALUES
(1, 1, 1, 'Test Message', '2011-01-20 00:13:51'),
(2, 1, 6, 'Test Message', '2011-01-20 01:03:50'),
(3, 1, 6, 'Test Message', '2011-01-20 01:22:52'),
(4, 1, 6, 'Test Message', '2011-01-20 11:59:01'),
(5, 1, 1, 'Test Message', '2011-01-20 11:59:22'),
(6, 1, 6, 'Test Message', '2011-01-20 12:10:37'),
(7, 1, 1, 'Test Message', '2011-01-20 12:10:51'),
(8, 2, 6, 'Test Message', '2011-01-20 12:45:29'),
(9, 1, 6, 'Test Message', '2011-01-20 13:08:42'),
(10, 1, 1, 'Test Message', '2011-01-20 13:09:49'),
(11, 2, 1, 'Test Message', '2011-01-20 13:10:17'),
(12, 3, 1, 'Test Message', '2011-01-20 13:11:09'),
(13, 1, 1, 'Test Message', '2011-01-21 02:31:43'),
(14, 2, 1, 'Test Message', '2011-01-21 02:31:52'),
(15, 4, 1, 'Test Message', '2011-01-21 02:31:57'),
(16, 3, 1, 'Test Message', '2011-01-21 02:32:10'),
(17, 4, 6, 'Test Message', '2011-01-20 22:36:57'),
(20, 1, 6, 'Test Message', '2011-01-20 23:02:36'),
(21, 4, 1, 'Test Message', '2011-01-20 23:17:22');

EDIT: Apologies - I may have got things slightly confused here - basically what I want is to retrieve all messages for a given user_id, THEN find the latest message (per thread_id) from those retrieved messages.

MAX POWER
  • 5,213
  • 15
  • 89
  • 141
  • Is it the sort order that's the problem or is it selecting the oldest record for each user rather than the newest? – ChrisF Jan 25 '11 at 00:05
  • It's selecting the oldest record instead of the newest. The ORDER BY is working fine. – MAX POWER Jan 25 '11 at 00:10
  • Would you be able to provide a dump of the table in question with some prefilled values so that I can test my query on it. I think I have a much simpler way using a scalar subquery and not using all the group bys cross joins etc. – andrew Jan 25 '11 at 00:36
  • To avoid costly subquery, check my answer to the similar question: http://stackoverflow.com/a/33348557/1630623 – Frane Poljak Oct 26 '15 at 14:44

4 Answers4

40
SELECT 
    id, thread_id, user_id, subject, body, date_sent
FROM
    messages
WHERE
    date_sent IN (SELECT 
            MAX(date_sent)
        FROM
            messages
        WHERE
            user_id = 6
        GROUP BY thread_id)
ORDER BY thread_id ASC, date_sent DESC;

Let me know if it works now

Akash Kumar Verma
  • 3,185
  • 2
  • 16
  • 32
Ashraf Abrahams
  • 769
  • 6
  • 11
4

It's a two stop process. First find the newest dates for each thread_id. Then select records that have these dates and matching thread_ids

SELECT t.id, t.thread_id, t.user_id, t.body, t.date_sent
FROM messages AS t
CROSS JOIN (
  SELECT thread_id, MAX(date_sent) AS date_sent FROM messages WHERE user_id = 1 GROUP BY thread_id
) AS sq
USING (thread_id, date_sent)

Note that if two (or more) messages have same date_sent and same thread_id they will both be selected (because you can't tell which one is newer)

Mchl
  • 61,444
  • 9
  • 118
  • 120
  • It's all true. This operation is often called "groupwise max". For even more examples, read this: http://jan.kneschke.de/projects/mysql/groupwise-max/ ...or Google for "groupwise max". – TehShrike Jan 25 '11 at 00:11
  • Thanks for this. Can you provide an example that uses a similar syntax to what I have used, i.e. without the CROSS JOIN and USING? I know your method is probably more effective but I prefer using the simple sytnax that I'm used to :) – MAX POWER Jan 25 '11 at 00:18
  • Provide a dump of you table with some prefilled values and I think I can write you a much simpler query usig a scalar subquery. And it won't rely on cross joins or group bys – andrew Jan 25 '11 at 00:49
  • OK done. Please note there in fact no 'subject' feld as per my original post. – MAX POWER Jan 25 '11 at 01:00
  • I refuse to provide any 'simpler' syntax because this synax IS simple and you better get used to it if you want use SQL effectively. I have updated the query according to your edited question. – Mchl Jan 25 '11 at 07:38
2

This is a really old question but anyway...

Your where clause isn't specific enough, and using date_sent to select the right record is just wrong. Try this:

SELECT id, thread_id, user_id, subject, body, date_sent
FROM messages
WHERE id=(
    select m2.id from messages m2
    where messages.thread_id=m2.thread_id
    order by date_sent desc limit 1)
ORDER BY date_sent DESC

If you want to assume that id always increases over time, this would probably perform better:

SELECT id, thread_id, user_id, subject, body, date_sent
FROM messages
WHERE id in (
    select max(m2.id) from messages m2 group by m2.thread_id)
ORDER BY date_sent DESC
PaulC
  • 505
  • 4
  • 10
1

From what I can see, your problem lies with the sub-query. The sub-query will actually be extracting the maximum date_sent field from the current record, in other words, as the outer query traverses the table one record at a time the two date_sent fields in the sub-query "date_sent=(select max(date_sent)" will always be the same. After it displays the first record for a particular thread_id, it doesn't show any other records for that thread_id since you grouping by thread_id. This is why, it will always show the first record entered for each thread_id. BTW, it's showing the first record entered for each thread_id and not the earliest date_sent record. Your result is dependent on the location of the record in your table and not on the value of date_sent. Not sure if I explained this correctly, but anyway, to fix your problem try:

SELECT id, thread_id, user_id, subject, body, date_sent
FROM messages
WHERE user_id=1 AND date_sent IN (select max(date_sent) from messages GROUP BY thread_id)
GROUP BY thread_id
ORDER BY date_sent DESC;

Firstly, the sub-query must have a FROM clause and a GROUP BY clause to pull up the maximum dates per thread_id from the WHOLE table and not just the current record. Also, the = must be replaced with an IN since the sub-query might result in multiple records. If the table contains two records of the same thread-id on the same date, only the first one will be displayed. This is caused by the second GROUP BY clause in the outer query. To display all the records for that thread_id on that day, try:

SELECT id, thread_id, user_id, subject, body, date_sent
FROM messages
WHERE user_id =1 AND date_sent IN (SELECT MAX( date_sent ) FROM messages GROUP BY thread_id)
ORDER BY thread_id ASC , date_sent DESC;

By removing the second GROUP BY clause and adding an ORDER BY clause, you can show all messages for that maximum date for each thread_id and still display the threads in the correct order. Hope that helps.

prayagupa
  • 30,204
  • 14
  • 155
  • 192
Ashraf Abrahams
  • 769
  • 6
  • 11
  • Hi mate, thanks for this. I tried both your queries, but each time it is not returning the correct number of records for a given user_id. There are no duplicate dates in my dataset. – MAX POWER Jan 25 '11 at 02:07
  • OK I may have got things slightly confused here - basically what I want is to retrieve all messages for a given user_id, THEN find the latest message (per thread_id) from those retrieved messages. – MAX POWER Jan 25 '11 at 02:45
  • This answer has the same problem as the first. Basically, you are assuming that every record has a different date, which will work until the database gets busy. – PaulC Mar 13 '15 at 21:32