0

How do I find duplicate entries in a column (like multiple rows in student_id column) and join those to another table with common id (like student_id from another table to get student_name)?

grades:

student_id,grade
6,A
1,B
1,F
7,C
6,A

students:

student_id,student_name
1,roy
6,bob
7,art

so that I can get a joined table of only duplicate student rows on student_id that resembles:

student_id,student_name,grade
6,bob,A
1,roy,B
1,roy,F
6,bob,A

It would be nice to then just return the names of the most recent data frame:

bob
roy
Jonathan Leffler
  • 730,956
  • 141
  • 904
  • 1,278
Thomas Matthew
  • 2,826
  • 4
  • 34
  • 58

2 Answers2

2

Try this:

select students.student_name
from grades 
inner join students on grades.student_id = students.student_id
group by student_name
having count(*) > 1

Example:

create table grades (student_id int, grade char(1));
insert into grades values (6, 'A'), (1, 'B'), (1, 'F'), (7, 'C'), (6, 'A');
create table students (student_id int, student_name varchar(20));
insert into students values (1, 'roy'), (6,'bob'), (7,'art');

select students.student_name
from grades 
inner join students on grades.student_id = students.student_id
group by student_name
having count(*) > 1

Result:
student_name
bob
roy

This should result in bob and roy.

Command line on Linux, assuming you have sqlite:

~$ sqlite
SQLite version 2.8.17
Enter ".help" for instructions
sqlite> create table grades (student_id int, grade char(1));
sqlite> insert into grades values (6, 'A');
sqlite> insert into grades values (1, 'B');
sqlite> insert into grades values (1, 'F');
sqlite> insert into grades values (7, 'C');
sqlite> insert into grades values (6, 'A');
sqlite>
sqlite> create table students (student_id int, student_name varchar(20));
sqlite> insert into students values (1, 'roy');
sqlite> insert into students values (6,'bob');
sqlite> insert into students values (7,'art');
sqlite>
sqlite> select students.student_name
   ...> from grades
   ...> inner join students on grades.student_id = students.student_id
   ...> group by student_name
   ...> having count(*) > 1;
roy
bob
sqlite>
zedfoxus
  • 35,121
  • 5
  • 64
  • 63
  • thanks! what is the best way to test your suggestion at the linux command line? I have the grades and students tables as separate csv files. At first, I was using pandas and sqlite3, but there has to be a cleaner, fast way! What is it? – Thomas Matthew Sep 16 '15 at 19:04
  • 1
    You can test this with sqlfiddle.com. I have created this example on fiddle: http://sqlfiddle.com/#!9/1365a/1 – zedfoxus Sep 16 '15 at 19:07
  • 1
    I have edited my answer to show an example you can try with SQLite on Linux – zedfoxus Sep 16 '15 at 19:10
1

Find duplicates and then join it back for more information:

SELECT s.student_id, s.student_name
FROM student s
INNER JOIN
   (SELECT g.student_id
    FROM grades g
    GROUP BY g.student_id
    HAVING COUNT(*) > 1) dups
 ON s.student_id = dups.student_id
Eric Hotinger
  • 8,957
  • 5
  • 36
  • 43
  • `COUNT(*)` rather than `COUNT(grades)`. See http://stackoverflow.com/questions/2876909/count-and-countcolumn-name-whats-the-diff – Barmar Sep 16 '15 at 19:02