Find and remove duplicate rows by two columns where one of those columns is a TIMEDATE

Question

I'm trying to find and remove duplicate rows by two columns in a table

one recommended solution is to put a unique index and then remove it, like this:

ALTER IGNORE TABLE your_table ADD UNIQUE (field1,field2,field3);

(Find and remove duplicate rows by two columns)

One of the columns is DATETIME, and i want to use the date part of it as the 'unique' feature to remove duplicates by. How do I do that?

for example:

[column1] [column2]
[2020-01-01 11:11:11] [john]
[2020-01-01 16:11:11] [john]
[2020-07-07 17:17:11] [mike]

i want one of the first 2 rows to be deleted because the date part of the timestamp (in column1) and the name (column2) , are the same.

expected result:

[column1] [column2]
[2020-01-01 11:11:11] [john]
[2020-07-07 17:17:11] [mike]

Do you have a primary key column? – juergen d Nov 09 '20 at 10:16 — juergen d, Nov 09 '20 at 10:16

score 1 · Answer 1 · edited Nov 09 '20 at 11:58

1

delete t1
from your_table t1
left join
(
  select min(col1) mcol, col2
  from your_table
  group by col2
) tmp on t1.col1 = tmp.mcol 
     and t1.col2 = tmp.col2
where tmp.col2 is null

edited Nov 09 '20 at 11:58

Gordon Linoff

1,242,037
58
646
786

answered Nov 09 '20 at 10:19

juergen d

201,996
37
293
362

score 1 · Accepted Answer · answered Nov 09 '20 at 10:22

We can just use exists logic here:

DELETE
FROM yourTable t1
WHERE EXISTS (SELECT 1 FROM yourTable t2
              WHERE t2.column2 = t1.column2 AND
                    DATE(t2.column1) = DATE(t1.column1) AND
                    t2.column1 < t1.column1);

Read in plain English, the above query says to delete any record for which we can find another record with the same name (column2 value), same date, but an earlier timestamp, with time portion included. This exists check would fail for a given name having the earliest timestamp value.

Find and remove duplicate rows by two columns where one of those columns is a TIMEDATE

2 Answers2