2

I have these two tables:

Student:

|                name |                   email |
|---------------------|-------------------------|
|    Arturo     Vidal |     arturo.vidal@usm.cl |
|   Bastian   Quezada |          bastian@usm.cl |
|    Javier     Jeria |           javier@usm.cl |
| Sebastian    Piñera | sebastian@presidente.cl |
| Sebastian  Gallardo |        sebastian@usm.cl |

Class:

| classId |                   email |  signUpDate |
|---------|-------------------------|-------------|
|       1 |        sebastian@usm.cl |  2018-01-01 |
|       1 |           javier@usm.cl |  2019-10-01 |
|       1 |          bastian@usm.cl |  2018-07-01 |
|       2 |        sebastian@usm.cl |  2018-05-04 |
|       2 |          bastian@usm.cl |  2018-01-01 |
|       3 |          bastian@usm.cl |  2018-12-05 |
|       3 |        sebastian@usm.cl |  2018-02-01 |
|       4 |     arturo.vidal@usm.cl |  2018-03-01 |
|       5 | sebastian@presidente.cl |  2018-03-01 |

I want to show the name the last student that signed up for each classId. That means, I should get a name for classId 1, one for classId 2, etc. My solution for firstly getting the mails (to know the student's name after) is this:

select classId, email, max(signUpDate)
from Class
group by classId

it prints the max date, which is ok, but it also prints the wrong mails for each date:

| ClassId |                   email | max(signUpDate) |
|---------|-------------------------|-----------------|
|       1 |        sebastian@usm.cl |      2019-10-01 |
|       2 |        sebastian@usm.cl |      2018-05-04 |
|       3 |          bastian@usm.cl |      2018-12-05 |
|       4 |     arturo.vidal@usm.cl |      2018-03-01 |
|       5 | sebastian@presidente.cl |      2018-03-01 |

which is completely wrong (). Therefore, when I try to join the the values for getting the names, I get incorrect values.

In other words, I don't understand why are the rows mixing up. Is there any solution for getting correct emails for the max(signUpDate) for each ClassId?

Thanks for your time

Lou
  • 2,200
  • 2
  • 33
  • 66
M Aubel
  • 33
  • 2
  • 5
  • 1
    Step 1) Add a student_id column to the students table, should be primary key. – jarlh Mar 28 '18 at 12:13
  • It makes no sense combining `DISTINCT` with `GROUP BY`. When grouping you always get one record for each group anyway. – Giorgos Betsos Mar 28 '18 at 12:17
  • 1
    Also, using a non aggregated field, like `email`, that is not contained in `GROUP BY`, is syntactically incorrect in most RDMBS. MySQL allows it, but this doesn't mean that the `email` value being selected is the one that corresponds to `max(signUpDate)`. – Giorgos Betsos Mar 28 '18 at 12:19
  • check this https://stackoverflow.com/questions/32673980/select-max-value-in-subquery-in-sql – zubair khanzada Mar 28 '18 at 12:22
  • Step 2) Replace the email column in Class table with the new student_id column. Foreign key! – jarlh Mar 28 '18 at 12:23

3 Answers3

3

i have created the test data fiddle and made an easy and understandable query to fetch the required data, i.e:

SELECT DISTINCT classId,
                std.name,
                Class.email,
                signUpDate
FROM CLASS
INNER JOIN Student std ON std.email = Class.email
WHERE signUpDate IN
    (SELECT max(signUpDate)
     FROM CLASS
     GROUP BY classId)

Sql Fiddle here

Andy K
  • 4,944
  • 10
  • 53
  • 82
ARr0w
  • 1,701
  • 15
  • 31
1

This is an instance of a very common class of questions: find the whole row FOR EACH GROUP of the field that maximizes some value (in the group). In your case, you want to GROUP BY the ClassId, and FOR EACH ONE OF THESE GROUPS, you want the whole row of the field with the maximum signupDate.

SHORT ANSWER: You can use this query:

SELECT 
    C.ClassId, 
    S.name
FROM
(
    SELECT A.* 
    FROM Class AS A 
    LEFT JOIN Class AS B 
    ON A.email = B.email AND A.signupDate < B.signupDate 
    WHERE B.email IS NULL
) AS C
LEFT JOIN Student AS S ON S.email=C.email

LONG ANSWER:

Here you can find a very clear explanation of what I have just said.

Assuming that we can use the e-mail at your tables as unique identifier, you can do FIRST a join (on the e-mail field) of the table "Class" with itself, to select the "maximum date" for each class id. After that, you join (on the e-mail field) with the table "Student". After that, you will have a table with all the fields of the "Class" table and all the fields of the "Student" table. You can select the fields that you need. In the following example, I will select "Class.classId" and "Student.name"

If you run this query:

SELECT A.* 
FROM Class AS A 
LEFT JOIN Class AS B 
ON A.email = B.email AND A.signupDate < B.signupDate 
WHERE B.email IS NULL

You obtain this table:

+---------+-------------------------+------------+
| ClassId | email                   | signupDate |
+---------+-------------------------+------------+
|       1 | javier@usm.cl           | 2019-10-01 |
|       2 | sebastian@usm.cl        | 2018-05-04 |
|       3 | bastian@usm.cl          | 2018-12-05 |
|       4 | arturo.vidal@usm.cl     | 2018-03-01 |
|       5 | sebastian@presidente.cl | 2018-03-01 |
+---------+-------------------------+------------+

Now you can join this with the table "Student", and select the fields that you want. If you run the query provided in the "short answer" part of this post, you get the following result:

+---------+--------------------+
| ClassId | name               |
+---------+--------------------+
|       4 | Arturo Vidal       |
|       3 | Bastian Quezada    |
|       1 | Javier Jeria       |
|       5 | Sebastian Piñera   |
|       2 | Sebastian Gallardo |
+---------+--------------------+
  • Sorry for my bad english... In the first paragraph I was trying to say that your problem is a "common problem" in SQL You want not only "the maximum of each group", but also "all the row that corresponds to that maximum". This problem is well known. You can see the link that I have provided in the post for further explanation. – Eric Biagioli Mar 28 '18 at 13:17
  • voting up for great struggle :) – ARr0w Apr 18 '18 at 11:10
  • I recently learned that this class of question has its own tag, [tag:greatest-n-per-group], which I think is pretty neat that so many people need to know this that a tag was created for it! :) I've suggested editing it into the question anyway. – Lou Dec 10 '21 at 11:39
0

Try this:

SELECT A.classId, C.name, C.email, B.signUpDate
FROM
(SELECT classId, max(signUpDate) maxSignUpDate
FROM Class
GROUP BY classId) A JOIN Class B
ON A.classId=B.classId AND A.maxSignUpDate=B.signUpDate
JOIN Student C ON C.email=B.email;

I assume the email to be the ID field of the Student table. See MySQL Join Made Easy and MySQL GROUP BY for insights.

See it run on SQL Fiddle.

cdaiga
  • 4,861
  • 3
  • 22
  • 42