3

I'm trying to get the

  • latest thread (id, topic, timestamp, author_id) and
  • latest post (id, thread_id, timestamp, author_id)
  • of each forum (id, name)
  • ordered by the latest post, indepent from the thread's creationdate.

Why?

I'd like to be able to display details like:

"The latest Answer of forum $forum_id was given on Question $thread_id. Here it is: $post_id"

SELECT  f.id AS forum_id,
        f.name AS forum_name,
        t.id AS thread_id,
        t.topic AS thread_topic,
        t.ts AS thread_timestamp,
        p.id AS post_id,
        p.content AS post_content,
        p.ts AS post_timestamp

 FROM   forums f,
        threads t,
        posts p

WHERE   f.id = t.forum_id 
  AND   t.id = p.thread_id

GROUP BY f.id
ORDER BY p.ts

Any advices, how to change the SQL to get the wanted result as much performant as possible? I'm trying to avoid subqueries but I'm open-minded!

Thanks in advance!

Mr. B.
  • 8,041
  • 14
  • 67
  • 117

2 Answers2

3

Since MySQL doesn't support window functions, I don't think there's any way to do this without a subquery:

SELECT  f.id AS forum_id,
    f.name AS forum_name,
    t.id AS thread_id,
    t.topic AS thread_topic,
    t.ts AS thread_timestamp,
    p.id AS post_id,
    p.content AS post_content,
    p.ts AS post_timestamp

FROM   forums f
JOIN (SELECT t2.forum_id, max(p2.ts) as ts
      FROM posts p2
      JOIN threads t2 ON p2.thread_id = t2.id
      GROUP BY t2.forum_id) max_p ON f.id = max_p.forum_id
JOIN   posts p ON max_p.ts = p.ts
JOIN   threads t ON f.id = t.forum_id AND p.thread_id = t.id
ORDER BY p.ts

Naturally, caching the latest results would let you do this without the performance penalty of calling MAX(), but with the right indices, this shouldn't be much of an issue...

UPDATE

The most concise way of including the threads without posts and forums without threads would be to use LEFT JOINs instead of an INNER JOINs:

SELECT  f.id AS forum_id,
    f.name AS forum_name,
    t.id AS thread_id,
    t.topic AS thread_topic,
    t.ts AS thread_timestamp,
    p.id AS post_id,
    p.content AS post_content,
    p.ts AS post_timestamp

FROM   forums f
LEFT JOIN (SELECT t2.forum_id, max(COALESCE(p2.ts, t2.ts)) as ts, COUNT(p2.ts) as post_count
      FROM threads t2 
      LEFT JOIN posts p2 ON p2.thread_id = t2.id
      GROUP BY t2.forum_id) max_p ON f.id = max_p.forum_id
LEFT JOIN   posts p ON max_p.ts = p.ts
LEFT JOIN   threads t ON f.id = t.forum_id AND (max_p.post_count = 0 OR p.thread_id = t.id)
ORDER BY p.ts
PinnyM
  • 35,165
  • 3
  • 73
  • 81
  • Thanks for your reply. It's almost what I wanted, but I just want one row per forum and it's (latest) details. Do you know, how to fix it? – Mr. B. Jun 20 '13 at 20:17
  • @Mr.Bombastic: Thinking through the problem again - you should only need one (more complex) subquery to get the data you need. This will return the most recent post per forum and the thread associated with that post. Does this work for you? – PinnyM Jun 20 '13 at 21:17
  • Thanks for your time! I'm getting this error by using your code: #1054 - Unknown column 't.forum_id' in 'on clause'. Please notice, I changed two column-names to get a better overview: threads.ts is now threads.timestamp_created and posts.ts now posts.timestamp_posted. – Mr. B. Jun 20 '13 at 21:27
  • 1
    @Mr.Bombastic: My bad, how about now? – PinnyM Jun 20 '13 at 21:37
  • Great, it's working almost perfect. Just two more details I need to get fixed: 1) new topics without any posts are not listed. In this case I'd like to be able to display something like "Latest Question ($thread_id) of Forum $forum_id has no answers yet.". 2) If a forum has no threads it should be listed anyway (with empty fields). Almost done. Very big THANK YOU for your help! – Mr. B. Jun 21 '13 at 11:48
  • @Mr.Bombastic: these requirements are somewhat contradictory, or perhaps I'm missing something. You want the latest post for each forum, but you also want the latest thread for each forum where no post has been associated. What if a post has been added to a different thread on the forum after the empty thread was created? In this case, do you want to see the latest post for the forum, or the empty thread, or both? – PinnyM Jun 21 '13 at 14:36
  • I think, it's because of my bad English. To be as less contradictory as possible, here's how it should look like: http://jsfiddle.net/hYmVt/ I hope, it explains what I tried to do. :-) – Mr. B. Jun 21 '13 at 15:02
  • @Mr.Bombastic: Still not clear. If Forum2 had several questions of which the last was not answered, but others were answered _after the last question was asked_... should Forum2 display the last question, or the question that was last answered (since that is more recent than the last question), or both? – PinnyM Jun 21 '13 at 15:48
  • The last thread will be displayed on top UNTIL there's a new post of another (maybe very old) thread. Please take a look at this table: http://tinyurl.com/poa8pnd Hope it's getting clear. – Mr. B. Jun 21 '13 at 16:31
  • @Mr.Bombastic: Understood - does this updated code work for you? – PinnyM Jun 21 '13 at 18:00
  • Finally it looks perfect. Couldn't find any problems for now, but will test it extensive at the weekend. I realized, I need to increase my low-level SQL skills. Thanks again! – Mr. B. Jun 21 '13 at 20:05
1

I can think of two "proper" ways of doing this. The first is using joins and subqueries:

SELECT  f.id AS forum_id,
        f.name AS forum_name,
        t.id AS thread_id,
        t.topic AS thread_topic,
        t.ts AS thread_timestamp,
        p.id AS post_id,
        p.content AS post_content,
        p.ts AS post_timestamp
 FROM   forums f join
        threads t
        on f.id = t.forum_id join
        posts p
        on t.id = p.thread_id
WHERE   t.ts = (select ts from threads t2 where t2.forum_id = t.forum_id order by ts desc limit 1) and
        p.ts = (select ts from posts p2 where p2.thread_id = p.thread_id order by ts desc limit 1)
GROUP BY f.id
ORDER BY max(p.ts)

The problem with this approach is that this returns the most recent thread and the most recent post on that thread. Fixing this is cumbersome (and that might be what you really want.)

The subqueries get the latest date on for threads and posts. Performance depends on the indexes that you have. It might be acceptable. This is standard SQL.

The other is a trick with substring_index()/group_concat(), which is specific to MySQL:

SELECT  f.id AS forum_id,
        f.name AS forum_name,
        substring_index(group_concat(t.id order by t.ts desc separator '|'), '|', 1) AS thread_id,
        substring_index(group_concat(t.topic order by t.ts desc separator '|'), '|', 1)  AS thread_topic,
        substring_index(group_concat(t.ts order by p.ts desc separator '|'), '|', 1)  AS thread_timestamp,
        substring_index(group_concat(p.id order by p.ts desc separator '|'), '|', 1)  AS post_id,
        substring_index(group_concat(p.content order by p.ts desc separator '|'), '|', 1)  AS post_content,
        substring_index(group_concat(p.ts order by p.ts desc separator '|'), '|', 1)  AS post_timestamp
 FROM   forums f join
        threads t
        on f.id = t.forum_id join
        posts p
        on t.id = p.thread_id
GROUP BY f.id
ORDER BY max(p.ts);

This version might perform better (because you are already incurring the overhead of a group by). The separator character has to be chosen so it is not in any of the values. Otherwise, only the portion before the separator will appear.

One advantage is that the threads and posts are treated independently, so you get the most recent thread and, separately, the most recent post. You can get the most recent post on a given thread by changing the order by conditions in the group_concat().

Also, to get the ordering you want, you need to order by max(p.ts) rather than just p.ts. The latter would order by an arbitrary time stamp on the forum; there is no guarantee it would be the most recent one.

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
  • It sounds great, but unfortunately the SQL just returns an error: #1064 - You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'order by t.ts desc), '|', 1) AS thread_id, substring_index(group_concat(' at line 3 – Mr. B. Jun 20 '13 at 20:24
  • 1
    @Mr.Bombastic . . . Gosh, I always thought that the ordering of `separator` and `group by` were irrelevant. But the ordering makes a difference. I learned something. – Gordon Linoff Jun 20 '13 at 20:30
  • Learning never stops. :-) There's just one tiny problem left: anyhow it concats several columns (p.content), like this: "content of post #1, content of post #3, content of post #9, ...". – Mr. B. Jun 20 '13 at 20:42
  • @Mr.Bombastic . . . In fixing one problem, I created another. The version now has `separator '|'` for the content as well. – Gordon Linoff Jun 20 '13 at 20:44
  • Here we go again :-) #1064 - You have an error [...] near 'separtor '|'), '|', 1) AS post_content, substring_index(group_concat(p.' at line 7 – Mr. B. Jun 20 '13 at 20:52
  • @Mr.Bombastic . . . If you are content with the most recent post on the most recent thread, as opposed to the most recent post overall. (And the spelling error causing the other problem is fixed.) – Gordon Linoff Jun 20 '13 at 20:58
  • There's still a difference: solution 1 returns everything correct. solution 2 returns the correct forum and latest post but NOT the thread where the latest post was posted in. – Mr. B. Jun 20 '13 at 21:06
  • @Mr.Bombastic . . . I note this in the description of how they work. You can fix that by changing the `ordering` clause on the `group concat`. The question specifically says "the latest post . . . on each forum" NOT "the latest post on the latest thread of each forum". – Gordon Linoff Jun 20 '13 at 21:07
  • @Mr.Bombastic: The first solution will get the most recent post for the most recent thread, not the most recent post for the forum – PinnyM Jun 20 '13 at 21:15
  • @PinnyM unfortunately you're right. Both solutions are not working as I would like. – Mr. B. Jun 20 '13 at 21:17
  • @Mr.B. Did you ever get this solved? I want the exact same but Im struggling to get it working with the issues that PinnyM described. – Christoffer Feb 11 '17 at 20:32
  • @Christoffer I don't remember how I solved it, but I had a similar topic recently (sorting chat messages). I selected all the messages + replies and sorted them with `usort()` (Symfony + Doctrine). It's not a good practice for a forum, but as I'm not having many datasets, it's okay for now. – Mr. B. Feb 12 '17 at 12:29