I need to select several rows per each distinct record. Pretty much something like asked here Select first n records for each distinct ID in SQL Server 2008, although I use MySQL.
The purpose can be achieved by running 21 queries in this case: 1 general and 20 to get the subrecords, i.e. something like this:
SELECT DISTINCT `user_id`
FROM `posts`
WHERE `deleted` = '0'
ORDER BY `user_id` ASC
LIMIT 20
... to select all rows needed, and then
SELECT *
FROM `posts`
WHERE `deleted` = '0'
AND `user_id` = ?
ORDER BY `id` DESC
LIMIT 5
... in a loop per each row selected by the first query.
Basically, I sort of need to get 5 posts of each user. I need this done in a single query. The posts
setup is just for example, I made this up so it's, hopefully, easier to understand what I need.
I started with the following query:
SELECT *
FROM `posts`
WHERE `user_id`
IN (
SELECT DISTINCT `user_id`
FROM `posts`
ORDER BY `user_id` DESC
LIMIT 4
)
LIMIT 5
But I get the #1235 - This version of MySQL doesn't yet support 'LIMIT & IN/ALL/ANY/SOME subquery'
error.
So I've tried the JOIN
idea like suggested here:
SELECT posts.id,
posts.user_id,
NULL
FROM (
SELECT posts.user_id
FROM posts
WHERE posts.deleted = 0
LIMIT 20
) q
JOIN posts
ON posts.user_id = q.user_id
I've also tried several nested queries as suggested here:
SELECT *
FROM posts
WHERE user_id IN (
SELECT * FROM (
SELECT user_id
FROM posts
LIMIT 20
)
as t);
And the other solutions found on the Internet. But they either do not work or just simply select the first N rows from the database (regardless of the conditions and joins for some reason). Tried LEFT JOIN
, RIGHT JOIN
, even INNER JOIN
, but still no success.
Please help.
UPDATE Forgot to mention that the table is around 5GB in size.
UPDATE Tried the sub-sub query:
SELECT *
FROM `posts`
WHERE
`user_id` IN ( SELECT `user_id` FROM (
SELECT DISTINCT `user_id`
FROM `posts`
ORDER BY `user_id` DESC
LIMIT 4 ) limit_users
)
LIMIT 5
Same as above, it returns the following:
+----+---------+------+
| id | user_id | post |
+----+---------+------+
| 1 | 1 | a |
+----+---------+------+
| 2 | 1 | b |
+----+---------+------+
| 3 | 1 | c |
+----+---------+------+
| .. | .. | .. |
I.e. the 5 (which is what the outer LIMIT
is set to) rows for the same user. The weird thing is that if I run the sub and sub-sub query alone:
SELECT `user_id` FROM (
SELECT DISTINCT `user_id`
FROM `posts`
ORDER BY `user_id` DESC
LIMIT 4 ) limit_users
I get the 4 distinct values:
+---------+
| user_id |
+---------+
| 1 |
+---------+
| 2 |
+---------+
| 3 |
+---------+
| 4 |
+---------+