3

Ok, hours of SO digging, still I haven found a solution for a - IMO rather obvious - task. I have posts and I want to query up to 5 comments per post (the newest maybe).

So basically something like this:

SELECT p.id, p.title, c.id, c.text
FROM posts p
LEFT JOIN comments c ON p.id = c.postId LIMIT 5

(Pseudo, does not work)

How to LIMIT a JOIN?

Aron Woost
  • 19,268
  • 13
  • 43
  • 51
  • Why would you not just put the `LIMIT` on the overall result set and couple that with an `ORDER BY` clause to achieve your desired result set? – mituw16 Jun 29 '15 at 18:46
  • http://stackoverflow.com/a/24442655/861704 similar. – Jigar Jun 29 '15 at 18:54

3 Answers3

2
SELECT  *
FROM    posts p
LEFT JOIN
        comments c
ON      c.post_id = p.id
        AND c.id >=
        COALESCE(
                (
                SELECT  ci.id
                FROM    comments ci
                WHERE   ci.post_id = p.id
                ORDER BY
                        ci.post_id DESC, ci.id DESC -- You need both fields here for MySQL to pick the right index
                LIMIT   4, 1
                ), 0
                )

Create an index on comments (post_id) or comments (post_id, id) (if comments is MyISAM) for this to work fast.

Quassnoi
  • 413,100
  • 91
  • 616
  • 614
  • 1
    Can you explain a little more whats going on here? – Aron Woost Jun 29 '15 at 19:31
  • 1
    @AronWoost: for each post, you select the fifth largest comment id (or 0 if there are less than 5 comments) and select all comments which have the given post id and comment id equal or greater than the 5-th largest (or 0 if there are less than 5). – Quassnoi Jun 29 '15 at 19:34
  • Would it be safer to order by comment date, instead of id? (Assuming comment date is being recorded, even though it's not shown in the question). – AdamMc331 Jun 29 '15 at 20:06
  • @McAdam331: safer for what? – Quassnoi Jun 29 '15 at 20:09
  • Well, while it's most common, I've seen cases where you can't guarantee the largest id also represents the latest date, so I thought it was better to order by date to ensure you got the latest values. – AdamMc331 Jun 29 '15 at 20:10
  • 1
    @McAdam331: MySQL won't let you do a tuple comparison efficiently, so you'll have to do three subqueries for that. But that's doable if it's a problem. – Quassnoi Jun 29 '15 at 20:12
  • @Quassnoi Don't go complicating your answer, I was just curious about your take on it, always happy to have discussion. :) – AdamMc331 Jun 29 '15 at 20:13
  • @McAdam331: you'll need this condition: `c.date > date5 OR (c.date = date5 AND c.id >= id5)`, with a subquery instead of each instance of `date5` and `id5`. – Quassnoi Jun 29 '15 at 20:17
2

This looks like a [greatest-n-per-group] problem. The link is to the other tagged question on this site. I would start by getting all posts/comments as you have, and then you can limit it to the most recent 5 for each post like this:

SELECT p1.*, c1.*
FROM posts p1
LEFT JOIN comments c1 ON c1.post_id = p1.id
WHERE(
   SELECT COUNT(*)
   FROM posts p2
   LEFT JOIN comments c2 ON c2.post_id = p2.id
   WHERE c2.post_id = c1.post_id AND c2.commentDate >= c1.commentDate
) <= 5;

Here is another reference on the topic.

Community
  • 1
  • 1
AdamMc331
  • 16,492
  • 10
  • 71
  • 133
  • You probably mean `WHERE c2.post_id = c1.post_id AND c2.id >= c1.id` in line 8. Also c1 and p1 instead of c and p in line 3. – piotrm Jun 29 '15 at 19:17
  • I did not mean to connect them based on post, but based on the comments themselves. You are right, though, about missing the 1s on line three. – AdamMc331 Jun 29 '15 at 19:19
  • While you can compare dates instead of ids for comments you still have to correlate that subquery by post_id, because if you do it by comment_id the count will always return 1, which is always <= 5, resulting in whole that where subquery being redundant. And you will just see all rows. – piotrm Jun 29 '15 at 23:24
  • I see it now, and I've fixed that. Had the OP posted some sample data, I would have tested this before posting. – AdamMc331 Jun 29 '15 at 23:40
0

You can do it using variables:

SELECT pid, title, cid, text
FROM (
  SELECT p.id AS pid, p.title, c.id AS cid, c.text,
         @row_number:= IF(@pid = p.id,
                          IF (@pid:=p.id, @row_number+1, @row_number+1),
                          IF (@pid:=p.id, 1, 1)) AS rn
  FROM posts p
  CROSS JOIN (SELECT @row_number := 0, @pid := 0) AS vars
  LEFT JOIN comments c ON p.id = c.postId 
  ORDER BY p.id ) t  <-- add comments ordering field here
WHERE t.rn <= 5

Demo here

Giorgos Betsos
  • 71,379
  • 9
  • 63
  • 98