2

I have two tables:

 Students         Student_Grades
   V------------------------V
+----+------+    +----+------------+---------+-------+
| id | name |    | id | student_id | subject | grade |
+----+------+    +----+------------+---------+-------+
| 0  | Dave |    | 0  | 0          | Math    | 100   |
+----+------+    +----+------------+---------+-------+
| 1  | John |    | 1  | 0          | Chem    | 90    |
+----+------+    +----+------------+---------+-------+
| 2  | Kate |    | 2  | 0          | CompSCI | 95    |
+----+------+    +----+------------+---------+-------+
| 3  | Mimi |    | 3  | 1          | ELA     | 98    |
+----+------+    +----+------------+---------+-------+
                 | 4  | 2          | Biology | 92    |
                 +----+------------+---------+-------+
                 | 5  | 2          | Chem    | 94    |
                 +----+------------+---------+-------+
                 | 6  | 2          | Math    | 98    |
                 +----+------------+---------+-------+
                 | 7  | 3          | Math    | 100   |
                 +----+------------+---------+-------+ 

I would like to select all subjects and grades from a random student that is enrolled in more than three subjects. (Either Dave or Kate)

Students John and Mimi would not be even considered because they are not enrolled in three subjects.

I know I can achieve this with PHP but I would like this to be done with one query to the database.

SELECT * FROM Students t JOIN (SELECT CEIL(MAX(ID)*RAND()) AS ID FROM Students) AS x ON t.ID >= x.ID LIMIT 1

With the above query, I have selected a random student, with that I can go in and check if they have three subjects with SELECT count(subjects) FROM Students WHERE id=random_id.

If the count returned is below three, then I throw away the results and run the first query again.

How would I attempt this in one query?

Dave Chen
  • 10,887
  • 8
  • 39
  • 67
  • 2
    Why not just run a query to only get the ones that have 3 subjects and select a random one from that? – csteifel Jun 24 '13 at 04:49
  • Students that are enrolled in more subjects would have a higher chance of getting picked. – Dave Chen Jun 24 '13 at 04:50
  • 3
    @DaveChen I don't see how that follows. You'd select each student only once still. e.g. `select * from students where (select count(*) from student_grades where student_grades.id = students.id) >= 3` or something. Order randomly and take the top 1. – Patashu Jun 24 '13 at 04:50

1 Answers1

2

This is tested and working:

SELECT * 
FROM Students s 
JOIN (
    SELECT student_id 
    FROM Student_Grades 
    GROUP BY student_id 
    HAVING COUNT(*) >= 3 
    ORDER BY RAND() 
    LIMIT 1
) rs 
    ON rs.student_id = s.id 
JOIN 
    Student_Grades sg 
    ON sg.student_id = s.id

Here's the SQL Fiddle: http://sqlfiddle.com/#!2/e5b5b/1

chrislondon
  • 12,487
  • 5
  • 26
  • 65
  • You can improve this query by using a method for random row used in question instead of `ORDER BY RAND()`. – Gustek Jun 24 '13 at 05:14
  • @Gustek, I've never seen that used before but it doesn't seem like it would work for two reasons. 1) It's possible that there will be gaps in the ids (if one gets deleted) so it could try and select a row that doesn't exist and 2) We need to specifically select randomly only students that have 3 records in the Students_Grades table. But I'd love to optimize the query if it's possible. – chrislondon Jun 24 '13 at 05:23
  • There actually are gaps in my ID row (students that get expelled). – Dave Chen Jun 24 '13 at 05:26
  • He has `>=` condition not `=` so if he get not existent id in `CEIL(MAX(ID)*RAND())` it will return first one with bigger id. It may cause some rows selected more often on gaps but works faster than `ORDER BY RAND()`. More on [selecting random rows](http://stackoverflow.com/questions/4329396/mysql-select-10-random-rows-from-600k-rows-fast). As about second problem if we move randomization to outer query it is not a problem any more. – Gustek Jun 24 '13 at 13:13