MySQL - Merge rows in table based on multiple criteria

Question

I'd like to merge rows based on multiple criteria, essentially removing duplicates where I get to define what "duplicate" means. Here is an example table:

     ╔═════╦═══════╦═════╦═══════╗
     ║ id* ║ name  ║ age ║ grade ║
     ╠═════╬═══════╬═════╬═══════╣
     ║  1  ║ John  ║ 11  ║   5   ║
     ║  2  ║ John  ║ 11  ║   5   ║
     ║  3  ║ John  ║ 11  ║   6   ║
     ║  4  ║ Sam   ║ 14  ║   7   ║
     ║  5  ║ Sam   ║ 14  ║   7   ║
     ╚═════╩═══════╩═════╩═══════╝

In my example, let's say I want to merge on name and age but ignore grade. The result should be:

     ╔═════╦═══════╦═════╦═══════╗
     ║ id* ║ name  ║ age ║ grade ║
     ╠═════╬═══════╬═════╬═══════╣
     ║  1  ║ John  ║ 11  ║   5   ║
     ║  3  ║ John  ║ 11  ║   6   ║
     ║  4  ║ Sam   ║ 14  ║   7   ║
     ╚═════╩═══════╩═════╩═══════╝

I don't particularly care if the id column is updated to be incremental, but I suppose that would be nice.

Can I do this in MySQL?

You're probably better off dumping the result into a temp table (based on that one answer down there), and then truncate/dump this data back in. — durbnpoisn, Sep 10 '15 at 19:57

score 1 · Accepted Answer · answered Sep 10 '15 at 20:00

1

My suggestion, based on my above comment.

SELECT distinct name, age, grade 
into tempTable
from theTable

This will ignore the IDs and give you only a distinct dump, and into a new table.

Then you can either drop the old and, and rename the new one. Or truncate the old one, and dump this back in.

answered Sep 10 '15 at 20:00

durbnpoisn

4,666
2
16
30

Conceptually, this makes sense. I've never used `INTO`. Does `tempTable` have to exist before you run that command? When I try it, I get `Undeclared variable: tempTable `. – jds Sep 10 '15 at 20:04
No. In fact, the table should not exist first. This will create the thing for you. The column definitions will be created automatically based on the columns you're using to create it. Depending on your database, you may need to name your table with a schema, like "dbo.tempTable". – durbnpoisn Sep 10 '15 at 20:05
Okay, maybe this explains it: http://stackoverflow.com/questions/2949653/select-into-and-undeclared-variable-error. I'm using MariaDB. – jds Sep 10 '15 at 20:10
I didn't realize there was a difference. I've edited my question to remove references to SQL and will propose an edit to your answer. – jds Sep 10 '15 at 20:19
Well, the question was already plussed up because it seems it's helpful. And the comments here show how we arrived at a solution - up to and including two different methods (in syntax). So, there is no need to edit. In any case - glad to be of help... – durbnpoisn Sep 10 '15 at 20:40

score 1 · Answer 2 · answered Sep 10 '15 at 20:28

You could just delete the duplicates in place like this:

delete test
from test 
inner join (
  select name, age, grade, min(id) as minid, count(*)
  from test
  group by name, age, grade
  having count(*) > 1
) main on test.id = main.minid;

Example: http://sqlfiddle.com/#!9/f1a38/1

MySQL - Merge rows in table based on multiple criteria

2 Answers2