2

my table has duplicate row values in specific columns. i would like to remove those rows and keep the row with the latest id. the columns i want to check and compare are:

sub_id, spec_id, ex_time

so, for this table

+----+--------+---------+---------+-------+
| id | sub_id | spec_id | ex_time | count |
+----+--------+---------+---------+-------+
|  1 |    100 |     444 | 09:29   |     2 |
|  2 |    101 |     555 | 10:01   |    10 |
|  3 |    100 |     444 | 09:29   |    23 |
|  4 |    200 |     321 | 05:15   |     5 |
|  5 |    100 |     444 | 09:29   |     8 |
|  6 |    101 |     555 | 10:01   |     1 |
+----+--------+---------+---------+-------+

i would like to get this result

+----+--------+---------+---------+-------+
| id | sub_id | spec_id | ex_time | count |
+----+--------+---------+---------+-------+
|  5 |    100 |     444 | 09:29   |     8 |
|  6 |    101 |     555 | 10:01   |     1 |
+----+--------+---------+---------+-------+

i was able to build this query to select all duplicate rows from multiple columns, according to this question

select  t.*
from mytable t join
     (select id, sub_id, spec_id, ex_time, count(*) as NumDuplicates
      from mytable
      group by sub_id, spec_id, ex_time
      having NumDuplicates > 1
     ) tsum
     on t.sub_id = tsum.sub_id and t.spec_id = tsum.spec_id and t.ex_time = tsum.ex_time  

but now im not sure how to wrap this select with a delete query to delete the rows except for the ones with highest id. as shown here

buzibuzi
  • 724
  • 3
  • 15
  • 27

1 Answers1

2
  • You can modify your sub-select query, to get maximum value of id for each duplication combination.
  • Now, while joining to the main table, simply put a condition that id value will not be equal to the maximum id value.
  • You can now Delete from this result-set.

Try the following:

DELETE t 
FROM mytable AS t 
JOIN 
    (SELECT MAX(id) as max_id, 
            sub_id, 
            spec_id, 
            ex_time, 
            COUNT(*) as NumDuplicates
     FROM mytable
     GROUP BY sub_id, spec_id, ex_time
     HAVING NumDuplicates > 1
    ) AS tsum
     ON t.sub_id = tsum.sub_id AND 
        t.spec_id = tsum.spec_id AND 
        t.ex_time = tsum.ex_time AND 
        t.id <> tsum.max_id 
Madhur Bhaiya
  • 28,155
  • 10
  • 49
  • 57