3

I have a query that returns me the result in descending order like this.

  comment     postid      name      userid    tempid
  ----------------------------------------------------- 
    c1          199        User1      123321    1
    c2          199        User1      123321    2
    c3          199        User1      123321    3
    c4          199        User1      123321    4
    c5          199        User1      123321    5
    c6          199        User1      123321    6
    c7          198        User1      123321    7
    c8          198        User1      123321    8
    c9          198        User1      123321    9
    c10         197        User1      123321    10
    c11         197        User1      123321    11
    c12         197        User1      123321    12
    c13         197        User1      123321    13
    c14         197        User1      123321    13
    c15         197        User1      123321    13
    c16         197        User1      123321    13

Now i want to select top 5 records for each postid.The dersired result should be

  comment     postid      name      userid    tempid
  ----------------------------------------------------- 
    c1          199        User1      123321    1
    c2          199        User1      123321    2
    c3          199        User1      123321    3
    c4          199        User1      123321    4
    c5          199        User1      123321    5
    c7          198        User1      123321    7
    c8          198        User1      123321    8
    c9          198        User1      123321    9
    c10         197        User1      123321    10
    c11         197        User1      123321    11
    c12         197        User1      123321    12
    c13         197        User1      123321    13
    c14         197        User1      123321    13

Here's my query.

 DECLARE rangee INT;
 DECLARE uid BIGINT;

 SET @rangee = plimitRange * 10; 
 SET @uid    = puserid;

PREPARE STMT FROM
'
SELECT comments.comment,comments.postid,user.name,comments.userid,comments.tempid 
FROM
user
INNER JOIN comments ON user.userid=comments.userid
INNER JOIN posts ON posts.postID = comments.postid
 WHERE 
comments.postid <=  
(SELECT MAX(postid) FROM
 (
  SELECT wall.postid FROM wall,posts WHERE  
  wall.postid = posts.postid AND posts.userid=?
  ORDER BY wall.postid DESC LIMIT 10 OFFSET ?
 )sq1
)

AND
comments.postid >= 
(SELECT MIN(postid) FROM
 (
   SELECT wall.postid FROM wall,posts WHERE
   wall.postid = posts.postid AND posts.userid=?
   ORDER BY wall.postid DESC LIMIT 10 OFFSET ?
  )sq2
)

 AND
 posts.userid = ?
 ORDER BY comments.postid DESC,comments.tempid DESC;
'; 
EXECUTE STMT USING @uid,@rangee,@uid,@rangee,@uid;
DEALLOCATE PREPARE STMT;

How can I accomplish this?

pilcrow
  • 56,591
  • 13
  • 94
  • 135
Mj1992
  • 3,404
  • 13
  • 63
  • 102
  • I would imagine getting the comments table from a subquery with a limit on it, rather than a straight join would work. Possibly related: http://stackoverflow.com/q/2856397/438971 – Orbling Oct 23 '12 at 14:26
  • thnx for the reply but can you help me by specifing it in the query? – Mj1992 Oct 23 '12 at 14:50
  • Did you look at the link in the comments on the answer? Advanced sampling in MySQL, it's a little confusing: http://explainextended.com/2009/03/06/advanced-row-sampling/ – Orbling Oct 23 '12 at 14:54
  • I looked at it but i was not able to understand it. – Mj1992 Oct 23 '12 at 14:57
  • Just did an example using an orders table in a database I work with to show all the orders of the top 10 customers (by number of orders), to show the method: `SELECT o.\`customer_id\`, o.\`order_id\`, o.\`total_price\` FROM \`orders\` o JOIN (SELECT oo.\`customer_id\` FROM \`orders\` oo GROUP BY oo.\`customer_id\` ORDER BY COUNT(oo.\`order_id\`) DESC LIMIT 10) ooo ON o.\`customer_id\` = ooo.\`customer_id\`` – Orbling Oct 23 '12 at 15:26
  • possible duplicate of [Greatest n-per-group With Multiple Joins](http://stackoverflow.com/questions/10712646/greatest-n-per-group-with-multiple-joins) – pilcrow Oct 23 '12 at 16:26

1 Answers1

3

You have to use a subquery to materialise a table that contains a rank within each group (by post) for every record (comment). Then, in the outer query, you can filter for only those records with a rank within the desired range (e.g. [1,5] for top 5):

-- select top 5 comments of each of the user's desired posts
SELECT   comments.comment,
         comments.postid,
         user.name,
         comments.userid,
         comments.tempid
FROM     user JOIN (
           -- rank comments on user's desired posts by grouping a self-join
           -- use index (postid, tempid) for performance
           SELECT      c1.*, COUNT(*) rank
           FROM        (
                         -- select user's posts within desired range
                         SELECT postid
                         FROM   (
                                  -- rank user's posts by grouping a self-join
                                  -- use index (userid, postid) for performance
                                  SELECT      p1.postid, COUNT(*) rank
                                  FROM        posts p1
                                    LEFT JOIN posts p2
                                           ON p1.userid = p2.userid
                                          AND p1.postid < p2.postid
                                  WHERE       p1.userid = @uid
                                  GROUP BY    p1.postid
                                ) ranked_posts
                         WHERE  rank BETWEEN @rangee + 1 AND @rangee + 10
                       ) interesting_posts
                  JOIN comments c1 USING (postid)
             LEFT JOIN comments c2
                    ON c1.postid = c2.postid
                   AND c1.tempid < c2.tempid
           GROUP BY c1.postid
         ) comments USING (userid)
WHERE    comments.rank BETWEEN 1 AND 5
ORDER BY postid DESC, tempid DESC
eggyal
  • 122,705
  • 18
  • 212
  • 237
  • `rank comments on user's desired posts` can you explain what do you mean by this ? and what does `c1.*, IFNULL(COUNT(*),0) rank` this part is doing what is `rank` ???????? – Mj1992 Oct 23 '12 at 17:31
  • You wanted to select the top N comments, so we need to rank the comments on each post and then limit our results to only those within the rank interval `[1, 5]`. In that part `rank` is an alias (a column name) for the result of the preceding expression. – eggyal Oct 23 '12 at 17:36
  • when I run your query it says `column rank cannot be null` and can you please explain what does `IFNULL(COUNT(*),0)` this means ?? – Mj1992 Oct 23 '12 at 17:47
  • I've removed [`IFNULL()`](http://dev.mysql.com/doc//en/control-flow-functions.html#function_ifnull) from my query, as it was unnecessary... please refresh. – eggyal Oct 23 '12 at 17:49
  • OK so you are assigning numbers(rank) to each comment according to post meaning that if post `1` has 7 comments than i will rank them from 1 to 7 and then if post `2` has `4` comments i will rank them from 1 to 4 now in the upper query i will be using where clause like `comments.rank BETWEEN 1 AND 5` this and get 5 values from each postid ????? – Mj1992 Oct 23 '12 at 18:02
  • One final thing that i wanted to ask is that you mentioned `use index (postid, tempid) for performance` in your answer.But i didn't got it i mean where are you performing the indexing in the query and how? – Mj1992 Oct 23 '12 at 18:12
  • If the query runs fast enough, don't worry about it. If it's slow, then create those indexes on the relevant tables - for example `ALTER TABLE comments ADD INDEX(postid, tempid)`. – eggyal Oct 23 '12 at 18:38