3

I've got a problem with a correlation inside a sub-query.

I have 2 tables :
- table1 : contains "groups" with a groupid, a groupename and a categoryid
- table2 : tells which people is member of which group (with fields: userid, groupid)

I would like to ask my database to give me :
all groups from a specific "category" with for each of them :
- the groupid, the groupname
- and a random selection of 4 members for each group

I followed the question MySQL select 10 random rows from 600K rows fast to generate 4 random members of a specific group.


It works well if I run the query separately.

but if I try to incorporate my sub-query inside my "general query" :

SELECT 
g.groupid, g.groupname, 
(
SELECT GROUP_CONCAT(table2.userid SEPARATOR ",") 
    FROM table2, (
        SELECT userid AS uid
        FROM table2 
        WHERE table2.groupid = g.groupid 
        ORDER BY RAND( ) 
        LIMIT 4 
    ) tmp 
WHERE table2.userid = tmp.uid
) AS randomusers 
FROM table1 AS g WHERE g.categoryid = ? 

... I get a "Unknown column 'g.groupid' in 'where clause'" ERROR.


I tried to pass the subquery into a LEFT JOIN but I can't figure out how to do it properly as each of my attempts are unsuccessful.

Any help on this? Thanks :)

Community
  • 1
  • 1
SebT
  • 45
  • 1
  • 7

1 Answers1

0

If I understood right, your query looks overly complicated

SELECT t1.groupid, 
       t1.groupname, 
       (SELECT GROUP_CONCAT(table2.userid) FROM table2 WHERE table2.userid = t1.userid ORDER BY RAND() LIMIT 4) AS `users`
FROM table1 AS t1

Have not tested it, but this query should return first four random elements for each group

dkasipovic
  • 5,930
  • 1
  • 19
  • 25
  • Thanks for your answer! I agree on the fact that my query is very complicated, but it is said that doing a straight "ORDER BY RAND()" is very slow. For optimizing this type of query, it is highly recommanded to do a subquery in order to do sorting only on the “id”s. (see interested link provided in my post or even http://theoryapp.com/select-random-records-in-mysql/ – SebT Feb 25 '14 at 11:32
  • Indeed, my query will pull all records from the table2, order them by rand, and then take the first 4. Order by rand is always slower, but if you want to use it in table you can remove it from group_concat, and then change "LEFT JOIN table2" to "LEFT JOIN (SELECT ... ORDER BY RAND() LIMIT 4) table2". – dkasipovic Feb 25 '14 at 11:35
  • In my opinion, you have one derived table (subquery) too much. Try with: SELECT table1.groupid, table1.groupname, GROUP_CONCAT(t2.userid) AS `users` FROM table1 LEFT JOIN ( SELECT table2.userid WHERE table2.groupid = table1.groupid ORDER BY RAND() LIMIT 4 ) t2 GROUP BY table1.groupid – dkasipovic Feb 25 '14 at 11:43
  • Even if I write: SELECT table1.groupid, table1.groupname, GROUP_CONCAT(t2.userid) AS users FROM table1 LEFT JOIN ( SELECT table2.userid FROM table2 WHERE table2.groupid = table1.groupid ORDER BY RAND() LIMIT 4 ) t2 GROUP BY table1.groupid – SebT Feb 25 '14 at 12:04
  • Actually I believe that the 2nd nested subquery "SELECT userid AS uid FROM table2 WHERE table2.groupid = g.groupid ORDER BY RAND( ) LIMIT 4" is here to sort only on ids which has much smaller size instead of sorting on the all table... – SebT Feb 25 '14 at 12:12
  • Thanks for your edit. It doesn't give 4 random people but all people of the group because of this problem: http://stackoverflow.com/questions/3378324/limit-ignored-in-query-with-group-concat Need again to add a subquery to solve it... And the ORDER BY RAND () is not really optimized anyway. Actually after my research, I ended to the conclusion that the best way of doing it was the one proposed in my post. The only problem is that I can't figure out how to cope with the correlation problem in the 2nd nested subquery – SebT Feb 25 '14 at 12:20
  • Yes, you are right, LIMIT only limits number of results (which when using GROUP_CONCAT is always 1). Did you think about the approach where you would get the users in postprocessing? For example, get all groups and then for each group get 4 random users? Is that applicable? – dkasipovic Feb 25 '14 at 12:27
  • Sure it is... But I really wanted to avoid doing this:) Do you have an idea about how to get a 2 levels deep subquery (with correlation) inside a left join ? – SebT Feb 25 '14 at 12:31
  • I want to avoid post-processing because the original query may return 500 groups so I don't want to launch 1+500 queries to the db... Would be much nicer to do a left join (if it is sufficiently optimized to get acceptable performances) – SebT Feb 25 '14 at 12:36
  • This might sound funny, and I am not sure how applicable it might be, but you could always do the following: SELECT table1.groupid, table1.groupname, (SELECT userid FROM table2 WHERE table2.groupid = table1.groupid ORDER BY RAND() LIMIT 1) AS u1, (SELECT userid FROM table2 WHERE table2.groupid = table1.groupid ORDER BY RAND() LIMIT 1) AS u2, (SELECT userid FROM table2 WHERE table2.groupid = table1.groupid ORDER BY RAND() LIMIT 1) AS u3, (SELECT userid FROM table2 WHERE table2.groupid = table1.groupid ORDER BY RAND() LIMIT 1) AS u4, FROM table1 – dkasipovic Feb 25 '14 at 12:47
  • Thanks for your help and your suggestion. Unfortunately, not really applicable as it launches 4 subqueries ORDER BY RAND(), each of them being not optimized (cf. my comments about how optimizing a ORDER BY RAND() sorting) – SebT Feb 25 '14 at 13:06
  • You're welcome, I dont think it is possible the way you imagine it without loosing any performance. If you find an answer be sure to share it. Best of luck. – dkasipovic Feb 25 '14 at 13:09
  • If I found an answer I'll share it :) – SebT Feb 25 '14 at 13:13