SQL: Sort on number of row entries and join

Question

How do I find duplicate entries in a column (like multiple rows in student_id column) and join those to another table with common id (like student_id from another table to get student_name)?

grades:

student_id,grade
6,A
1,B
1,F
7,C
6,A

students:

student_id,student_name
1,roy
6,bob
7,art

so that I can get a joined table of only duplicate student rows on student_id that resembles:

student_id,student_name,grade
6,bob,A
1,roy,B
1,roy,F
6,bob,A

It would be nice to then just return the names of the most recent data frame:

bob
roy

i don't understand, you want just the student name? – Mattia Caputo Sep 16 '15 at 19:01 — Mattia Caputo, Sep 16 '15 at 19:01

zedfoxus · Accepted Answer · 2015-09-16T19:11:06.363

Try this:

select students.student_name
from grades 
inner join students on grades.student_id = students.student_id
group by student_name
having count(*) > 1

Example:

create table grades (student_id int, grade char(1));
insert into grades values (6, 'A'), (1, 'B'), (1, 'F'), (7, 'C'), (6, 'A');
create table students (student_id int, student_name varchar(20));
insert into students values (1, 'roy'), (6,'bob'), (7,'art');

select students.student_name
from grades 
inner join students on grades.student_id = students.student_id
group by student_name
having count(*) > 1

Result:
student_name
bob
roy

This should result in bob and roy.

Command line on Linux, assuming you have sqlite:

~$ sqlite
SQLite version 2.8.17
Enter ".help" for instructions
sqlite> create table grades (student_id int, grade char(1));
sqlite> insert into grades values (6, 'A');
sqlite> insert into grades values (1, 'B');
sqlite> insert into grades values (1, 'F');
sqlite> insert into grades values (7, 'C');
sqlite> insert into grades values (6, 'A');
sqlite>
sqlite> create table students (student_id int, student_name varchar(20));
sqlite> insert into students values (1, 'roy');
sqlite> insert into students values (6,'bob');
sqlite> insert into students values (7,'art');
sqlite>
sqlite> select students.student_name
   ...> from grades
   ...> inner join students on grades.student_id = students.student_id
   ...> group by student_name
   ...> having count(*) > 1;
roy
bob
sqlite>

thanks! what is the best way to test your suggestion at the linux command line? I have the grades and students tables as separate csv files. At first, I was using pandas and sqlite3, but there has to be a cleaner, fast way! What is it? — Thomas Matthew, Sep 16 '15 at 19:04
You can test this with sqlfiddle.com. I have created this example on fiddle: http://sqlfiddle.com/#!9/1365a/1 — zedfoxus, Sep 16 '15 at 19:07
I have edited my answer to show an example you can try with SQLite on Linux — zedfoxus, Sep 16 '15 at 19:10

Eric Hotinger · Answer 2 · 2015-09-16T19:02:49.413

1

Find duplicates and then join it back for more information:

SELECT s.student_id, s.student_name
FROM student s
INNER JOIN
   (SELECT g.student_id
    FROM grades g
    GROUP BY g.student_id
    HAVING COUNT(*) > 1) dups
 ON s.student_id = dups.student_id

edited Sep 16 '15 at 19:02

answered Sep 16 '15 at 19:01

Eric Hotinger

8,957
5
36
43

`COUNT(*)` rather than `COUNT(grades)`. See http://stackoverflow.com/questions/2876909/count-and-countcolumn-name-whats-the-diff – Barmar Sep 16 '15 at 19:02

SQL: Sort on number of row entries and join

2 Answers2