0

Background:

I run a platform which allows users to follow creators and view their content.

The following query successfully displays 50 posts ordered by popularity. There is also some other logic to not show posts the user has already saved/removed, but that is not relevant for this question.

Problem:

If one creator is particularly popular (high popularity), the top 50 posts returned will nearly all be by that creator.

This skews the results as ideally the 50 posts returned will not be in favor of one particular author.

Question:

How can I limit it so the author (which uses the field posted_by) is returned no more than 5 times. It could be less, but definitely no more than 5 times should one particular author be returned.

It should still be finally ordered by popularity DESC

SELECT * 
FROM   `source_posts` 
WHERE  `posted_by` IN (SELECT `username` 
                       FROM   `source_accounts` 
                       WHERE  `id` IN (SELECT `sourceid` 
                                       FROM   `user_source_accounts` 
                                       WHERE  `profileid` = '100')) 
       AND `id` NOT IN (SELECT `postid` 
                        FROM   `user_posts_removed` 
                        WHERE  `profileid` = '100') 
       AND `live` = '1' 
       AND `added` >= Date_sub(Now(), INTERVAL 1 month) 
       AND `popularity` > 1 
ORDER  BY `popularity` DESC 
LIMIT  50 

Thank you.

Edit:

I am using MySQL version 5.7.24, so unfortunately the row_number() function will not work in this instance.

John Cliven
  • 973
  • 1
  • 8
  • 21
  • 1
    What is the MySQL server version ? Also, there is just too many `IN(..` based subqueries. This will not scale well when data gets bigger. – Madhur Bhaiya Nov 20 '18 at 18:59

2 Answers2

1

In MySQL 8+, you would simply use row_number():

select sp.*
from (select sp.*,
             row_number() over (partition by posted_by order by popularity desc) as seqnum
      from source_posts sp
     ) sp
where seqnum <= 5
order by popularity desc
limit 50;

I'm not sure what the rest of your query is doing, because it is not described in your question. You can, of course, add additional filtering criteria or joins.

EDIT:

In earlier versions, you can use variables:

select sp.*
from (select sp.*,
             (@rn := if(@p = posted_by, @rn + 1,
                        if(@p := posted_by, 1, 1)
                       )
             ) as rn
      from (select sp.*
            from source_posts sp
            order by posted_by, popularity desc
           ) sp cross join
           (select @p := '', @rn := 0) params
     ) sp
where rn <= 5
order by popularity desc
limit 50;
Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
  • Thank you - this does work locally however the legacy server that runs my database is on MySQL version 5.7.24. Is there any alternative? – John Cliven Nov 21 '18 at 10:11
1

Could try the row number function. Using that, it would assign each employee a distinct "id." So if one employee had 50 records, only those with a row_number (named as "rank") less than or equal to 5 would be returned.

Select *
from(   
 SELECT `source_posts.*`, row_number() over (partition by `username` order by `popularity` desc) as rank
    FROM   `source_posts` 
    WHERE  `posted_by` IN (SELECT `username` 
                           FROM   `source_accounts` 
                           WHERE  `id` IN (SELECT `sourceid` 
                                           FROM   `user_source_accounts` 
                                           WHERE  `profileid` = '100')) 
           AND `id` NOT IN (SELECT `postid` 
                            FROM   `user_posts_removed` 
                            WHERE  `profileid` = '100') 
           AND `live` = '1' 
           AND `added` >= Date_sub(Now(), INTERVAL 1 month) 
           AND `popularity` > 1 
    ORDER  BY `popularity` DESC 
    LIMIT  50 `enter code here`)
where rank <= 5
ayellowbusman
  • 47
  • 1
  • 8
  • This works great locally, however I am running this on a legacy server that uses MySQL version 5.7.24. Is there any alternative? – John Cliven Nov 21 '18 at 10:10
  • Start with this link. https://stackoverflow.com/questions/3333665/rank-function-in-mysql ... I will play around with my personal mysql database to see how you would include the order by popularity descending. – ayellowbusman Nov 22 '18 at 03:43