2

I'm trying to find the students that have failed every subject in a set of subjects via PostgreSQL queries.

Students fail a subject if they have a not null mark < 50 for at least one course offering of the subject. And I want to find the students that have failed all subjects in the set of subjects Relevant_subjects.
NOTE: students can have several records per course.

SELECT People.name
FROM 
    Relevant_subjects
    JOIN Courses on (Courses.subject = Relevant_subjects.id)
    JOIN Course_enrolments on (Course_enrolments.course = Courses.id)
    JOIN Students on (Students.id = Course_enrolments.student)
    JOIN People on (People.id = Students.id)
WHERE
    Course_enrolments.mark is not null AND
    Course_enrolments.mark < 50 AND
;

With the code above, I get the students that has failed any of the Relevant_subjects but I my desired result is to get the students that has failed all Relevant_subjects. How can I do that?

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
Casper Lindberg
  • 1,053
  • 1
  • 11
  • 16

2 Answers2

1

I would use aggregation:

SELECT p.name
FROM Relevant_subjects rs JOIN
     Courses c
     ON c.subject = rs.id JOIN
     Course_enrolments ce
     ON ce.course = c.id JOIN
     Students s
     ON s.id = ce.student JOIN
     People p
     ON p.id = s.id
WHERE ce.mark < 50
GROUP BY p.id, p.name
HAVING COUNT(*) = (SELECT COUNT(*) FROM relevant_subjects);

Note: This version assumes that students only have one record per course and relevant_subjects has no duplicates. These can easily be handling using COUNT(DISTINCT) if necessary.

To handle duplicates, this would look like:

SELECT p.name
FROM Relevant_subjects rs JOIN
     Courses c
     ON c.subject = rs.id JOIN
     Course_enrolments ce
     ON ce.course = c.id JOIN
     Students s
     ON s.id = ce.student JOIN
     People p
     ON p.id = s.id
WHERE ce.mark < 50
GROUP BY p.id, p.name
HAVING COUNT(DISTINCT rs.id) = (SELECT COUNT(DISTINCT rs2.id) FROM relevant_subjects rs2);
Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
  • Hi Gordon! It seems like It works, if I'm able to implement the COUNT(DISTINCT) too. Where should I add it. COUNT(DISTINCT) doesn't work. Neither do COUNT(DISTINCT *)? – Casper Lindberg Oct 17 '19 at 01:47
1

Students fail a subject if they have a not null mark < 50 for at least one course offering of the subject.

One of many possible ways:

SELECT id, p.name
FROM  (
   SELECT s.id
   FROM   students                s
   CROSS  JOIN relevant_subjects rs
   GROUP  BY s.id
   HAVING bool_and( EXISTS(
            SELECT -- empty list
            FROM   course_enrolments ce
            JOIN   courses           c  ON c.id = ce.course
            WHERE  ce.mark < 50  -- also implies NOT NULL
            AND    ce.student = s.id
            AND    c.subject = rs.id
            )
         ) -- all failed
   ) sub
JOIN   people p  USING (id);
  1. Form a Carthesian Product of students and relevant subjects.

  2. Aggregate by student (s.id) and filter those who failed all subjects in the HAVING clause with bool_and()over a correlated EXISTS subquery testing for at least one such failed course for each student-subject combination.

  3. Join to people as final cosmetic step to get student names. I added id to get unique results (as names probably are not guaranteed to be unique).

Depending on actual table definition, your version of Postgres, cardinalities and value distribution, there may be (much) more efficient queries.

It's a case of at its core. See:

And the most efficient strategy is to eliminate as many students as possible as early in the query as possible - like by checking the subject with the fewest failing students first. Then proceed with only the remaining students etc.

Your case adds the specific difficulty that the number and identities of subjects to be tested are unknown / dynamic. Typically, a recursive CTE or similar offers best performance for this kind of problem:

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228