1

I'd like to merge rows based on multiple criteria, essentially removing duplicates where I get to define what "duplicate" means. Here is an example table:

     ╔═════╦═══════╦═════╦═══════╗
     ║ id* ║ name  ║ age ║ grade ║
     ╠═════╬═══════╬═════╬═══════╣
     ║  1  ║ John  ║ 11  ║   5   ║
     ║  2  ║ John  ║ 11  ║   5   ║
     ║  3  ║ John  ║ 11  ║   6   ║
     ║  4  ║ Sam   ║ 14  ║   7   ║
     ║  5  ║ Sam   ║ 14  ║   7   ║
     ╚═════╩═══════╩═════╩═══════╝

In my example, let's say I want to merge on name and age but ignore grade. The result should be:

     ╔═════╦═══════╦═════╦═══════╗
     ║ id* ║ name  ║ age ║ grade ║
     ╠═════╬═══════╬═════╬═══════╣
     ║  1  ║ John  ║ 11  ║   5   ║
     ║  3  ║ John  ║ 11  ║   6   ║
     ║  4  ║ Sam   ║ 14  ║   7   ║
     ╚═════╩═══════╩═════╩═══════╝

I don't particularly care if the id column is updated to be incremental, but I suppose that would be nice.

Can I do this in MySQL?

jds
  • 7,910
  • 11
  • 63
  • 101

2 Answers2

1

My suggestion, based on my above comment.

SELECT distinct name, age, grade 
into tempTable
from theTable

This will ignore the IDs and give you only a distinct dump, and into a new table.

Then you can either drop the old and, and rename the new one. Or truncate the old one, and dump this back in.

durbnpoisn
  • 4,666
  • 2
  • 16
  • 30
  • Conceptually, this makes sense. I've never used `INTO`. Does `tempTable` have to exist before you run that command? When I try it, I get `Undeclared variable: tempTable `. – jds Sep 10 '15 at 20:04
  • No. In fact, the table should not exist first. This will create the thing for you. The column definitions will be created automatically based on the columns you're using to create it. Depending on your database, you may need to name your table with a schema, like "dbo.tempTable". – durbnpoisn Sep 10 '15 at 20:05
  • Okay, maybe this explains it: http://stackoverflow.com/questions/2949653/select-into-and-undeclared-variable-error. I'm using MariaDB. – jds Sep 10 '15 at 20:10
  • I didn't realize there was a difference. I've edited my question to remove references to SQL and will propose an edit to your answer. – jds Sep 10 '15 at 20:19
  • Well, the question was already plussed up because it seems it's helpful. And the comments here show how we arrived at a solution - up to and including two different methods (in syntax). So, there is no need to edit. In any case - glad to be of help... – durbnpoisn Sep 10 '15 at 20:40
1

You could just delete the duplicates in place like this:

delete test
from test 
inner join (
  select name, age, grade, min(id) as minid, count(*)
  from test
  group by name, age, grade
  having count(*) > 1
) main on test.id = main.minid;

Example: http://sqlfiddle.com/#!9/f1a38/1

zedfoxus
  • 35,121
  • 5
  • 64
  • 63