MySQL Delete duplicates in consecutive rows

Question

Suppose this table:

ID ColA ColB
1   7    8
2   7    9
3   7    9
4   5    8
5   6    9
6   6    9
7   5    4

The PK is the ID coumn. Now, I want to delete all duplicates of ColA and ColB in consecutive rows.

In this example rows 2,3 and 5,6 contain duplicates. These shall be removed so that the higher ID is remained.

The output should be:

ID ColA ColB
1   7    8

3   7    9
4   5    8

6   6    9
7   5    4

How can this be done with mySQL?

Thanks, Juergen

Sorry man but have you tried or, at least, thought someting? — Aurelio De Rosa, Nov 23 '11 at 14:09

score 3 · Accepted Answer · edited Jun 26 '19 at 12:35

3

SELECT 
    ID
FROM
    MyTable m1
WHERE
    0 < (SELECT 
            COUNT(*)
        FROM
            MyTable m2
        WHERE
            m2.ID = m1.ID - 1 AND 
            m2.ColA = m1.ColA AND 
            m2.ColB = m1.ColB)

and then you can use a

delete from MyTable where ID in ...

query. This way it would surely work in any version.

edited Jun 26 '19 at 12:35

nowox

25,978
39
143
293

answered Nov 23 '11 at 14:10

Lajos Arpad

64,414
37
100
175

There is one problem remaining: I tried your first statement with SELECT, but when writing DELETE MyTable m1 ... the parser complains about m1 for some reason(wrong SQL syntax). – Juergen Nov 23 '11 at 14:29
Probably MySQL doesn't support this type of naming for deletes, however, it's better to use the new approach, in the EDIT, because of the problem I've described in the meantime. The new approach doesn't use this kind of naming. In the first step you select the ID's which should be deleted and using the results of your query you can write the delete statement. Question: Is this written directly in MySQL, or you are running your commands from another technology (like Java, C#, C++, PHP)? – Lajos Arpad Nov 23 '11 at 14:41
Directly in MySQL; the error I get with your edit approach is the error 1093: You can't specify target table 'MyTable' for update in FROM clause; I'll try the other two recommendations with a temporary table now. – Juergen Nov 23 '11 at 14:44
@Juergen since the fix wasn't explained, you can avoid this error by aliasing the table in the subquery, see http://stackoverflow.com/q/45494/2578505 Ex: change `FROM TargetTable` to `FROM (SELECT * FROM TargetTable) as TargetTable_Temp` – quetzaluz May 20 '15 at 17:18
This solution is correct only if IDs are in sequential order. If IDs are not in sequential order, you'll have to replace `m2.ID = m1.ID - 1` with something like `m2.ID = `. – Despotovic Oct 15 '20 at 07:54
@Despotovic true. The question indicates sequentiality though. – Lajos Arpad Oct 15 '20 at 10:53

score 2 · Answer 2 · answered Nov 23 '11 at 14:31

CREATE TEMPORARY TABLE duplicates (id int primary key)

INSERT INTO duplicates (id)
    SELECT t1.id
      FROM table t1
      join table t2 on t2.id = t1.id + 1
     WHERE t1.ColA = t2.ColA
       and t1.ColB = t2.ColB

-- SELECT * FROM duplicates --> are you happy with that? => delete
DELETE table
  FROM table
  join duplicates on table.id = duplicates.id

score 0 · Answer 3 · answered Nov 23 '11 at 14:14

0

Depending on how many records you have, this might not be the most efficient:

SELECT (SELECT TOP 1 id FROM table WHERE colA = m.colA AND colB = m.colB ORDER BY id DESC) AS id, m.*
FROM (SELECT DISTINCT colA, colB
      FROM  table) m

There might be syntax errors because I usually use mssql, but the idea should be similar.

answered Nov 23 '11 at 14:14

JohnathanKong

1,307
3
21
36

There is no top keyword in MySQL, Limit is used instead. Also, you don't delete duplicates. Also, your selection doesn't even select the duplicates (consecutive ID's except the first in the sequence) defined in the question. I'm sorry, but I have to downvote you. – Lajos Arpad Nov 23 '11 at 14:21
whoops, I missed the delete portion. Misread it, and yes, I should have used LIMIT instead of TOP; that was my ms stuff kicking in. But if you DELETE tabe WHERE NOT IN (above query), you'd be left with the the ids that are not in the nested select, and the nested select selects the first highest id because I order by and we're only selecting the first one. Just remove the m.*. – JohnathanKong Nov 23 '11 at 14:26

score 0 · Answer 4 · answered Nov 23 '11 at 14:29

I've called the first table 'test'.

Firstly create a table that will hold all the identical combinations of ColA and ColB:

create temporary table tmpTable (ColA int, ColB int);
insert into tmpTable select ColA,ColB from test group by ColA, ColB;

Now, select the maximum id in the original table for each identical combination of ColA and ColB. Put this into a new table (called idsToKeep because these are the rows we do not want to delete):

create temporary table idsToKeep (ID int);
insert into idsToKeep select (select max(ID) from test where test.ColA=tmpTable.ColA and test.ColB=tmpTable.ColB) from tmpTable;

Finally, delete all the entries from the original table that are not in the idsToKeep table:

delete from test where ID <> all (select ID from idsToKeep);

This approach guarantees that the row with the highest ID is not deleted. — drmatt, Nov 23 '11 at 14:38
That won't work if the row are not consecutive but the same ColA and ColB comes back in the sequence. — jsgoupil, Aug 22 '17 at 04:45

MySQL Delete duplicates in consecutive rows

4 Answers4

Linked