1

I'm trying to find and remove duplicate rows by two columns in a table

one recommended solution is to put a unique index and then remove it, like this:

ALTER IGNORE TABLE your_table ADD UNIQUE (field1,field2,field3);

(Find and remove duplicate rows by two columns)

One of the columns is DATETIME, and i want to use the date part of it as the 'unique' feature to remove duplicates by. How do I do that?

for example:

[column1] [column2]
[2020-01-01 11:11:11] [john]
[2020-01-01 16:11:11] [john]
[2020-07-07 17:17:11] [mike]

i want one of the first 2 rows to be deleted because the date part of the timestamp (in column1) and the name (column2) , are the same.

expected result:

[column1] [column2]
[2020-01-01 11:11:11] [john]
[2020-07-07 17:17:11] [mike]
Seeker
  • 57
  • 5

2 Answers2

1
delete t1
from your_table t1
left join
(
  select min(col1) mcol, col2
  from your_table
  group by col2
) tmp on t1.col1 = tmp.mcol 
     and t1.col2 = tmp.col2
where tmp.col2 is null
Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
juergen d
  • 201,996
  • 37
  • 293
  • 362
1

We can just use exists logic here:

DELETE
FROM yourTable t1
WHERE EXISTS (SELECT 1 FROM yourTable t2
              WHERE t2.column2 = t1.column2 AND
                    DATE(t2.column1) = DATE(t1.column1) AND
                    t2.column1 < t1.column1);

Read in plain English, the above query says to delete any record for which we can find another record with the same name (column2 value), same date, but an earlier timestamp, with time portion included. This exists check would fail for a given name having the earliest timestamp value.

Tim Biegeleisen
  • 502,043
  • 27
  • 286
  • 360