-2

How would I delete yesterday's duplicate data from a MySQL table and keep latest record?

For example, with the following data:

SELECT * FROM data;

| pid | serviceid | id             | created
| ----| ----------|----------------|---------
| 10  | ABNCKN    | Q0CHZ2GI1VKOPL | 2022-06-14 10:49:34
| 11  | ABNCKN    | Q0CHZ2GI1VKOPL | 2022-06-14 10:49:34
| 12  | KSPSLS    | QLSPDF0S0SDFKK | 2022-06-15 11:44:21
| 13  | AKNKSL    | QLSPDF0S0SDFKK | 2022-06-15 12:51:42
| 14  | AKNKSL    | QLSPDF0S0SDFKK | 2022-06-15 12:51:42

I used

DELETE n1 FROM data n1, data n2 WHERE n1.pid < n2.pid AND n1.id = n2.id

How could I DELETE to only remove duplicates and keep latest pid for yesterday's data?

Expected output would be

SELECT * FROM data;

| pid | serviceid | id             | created
| ----| ----------|----------------|---------
| 10  | ABNCKN    | Q0CHZ2GI1VKOPL | 2022-06-14 10:49:34
| 11  | ABNCKN    | Q0CHZ2GI1VKOPL | 2022-06-14 10:49:34
| 12  | KSPSLS    | QLSPDF0S0SDFKK | 2022-06-15 11:44:21
| 14  | AKNKSL    | QLSPDF0S0SDFKK | 2022-06-15 12:51:42

remove | 13 | AKNKSL | QLSPDF0S0SDFKK | 2022-06-15 12:51:42

Himanshu
  • 31,810
  • 31
  • 111
  • 133
SRash
  • 65
  • 5

3 Answers3

1

DELETE * FROM data WHERE id IN (SELECT id FROM data WHERE name="ABNCKN" LIMIT((SELECT COUNT(name) FROM data WHERE name="ABNCKN") - 1)));

SELECT id FROM data WHERE name="ABNCKN" >> this query will get all ids with specific name, and limit will limit the the number of rows returned. The number of rows will be dynamic based on the duplicated rows except one row. and you can add ORDER BY to the query to arrange them.

and finally delete all duplicated rows except one row.

I hope it works well

Elsayed Elbeshry
  • 21
  • 1
  • 2
  • 6
0

Didn't understand very well why you keep some duplicated data in the expected output but what you can try is get the last Id with a JOIN and filter it from the WHERE so is not deleted:

DELETE d1 FROM data d1, data d2
JOIN (SELECT MAX(pid) pid FROM data) d3
WHERE d1.pid != d3.pid AND d1.pid < d2.pid AND d1.id = d2.id;
Yonier
  • 31
  • 5
  • im running some batch jobs to upload data into mysql for every hour, sometimes it has duplicate entries, my requirement is to delete only last date duplicate entries. – SRash Jun 20 '22 at 05:26
0

You can achieve this using self join and add condition to only delete yesterday's data:

DELETE t1 FROM data t1 
  INNER JOIN data t2 
  ON t1.pid < t2.pid 
    AND t1.serviceid = t2.serviceid 
    AND t1.id = t2.id
    AND t1.created = t2.created
  WHERE CAST(t1.created AS DATE) = DATE_ADD(CURRENT_DATE(), INTERVAL -1 DAY);
    
SELECT * FROM data;

See this db<>fiddle

For MySQL 8+ you can also achieve this using ROW_NUMBER():

WITH CTE AS
(
  SELECT *, ROW_NUMBER() OVER(PARTITION BY serviceid,id,created ORDER BY pid DESC) AS RN
  FROM data
  WHERE CAST(created AS DATE) = DATE_ADD(CURRENT_DATE(), INTERVAL -1 DAY)
)
DELETE data
FROM data
JOIN CTE ON data.pid=CTE.pid
WHERE CTE.RN>1;

SELECT * FROM data;

Check this db<>fiddle

Output after delete:

pid serviceid id created
10 ABNCKN Q0CHZ2GI1VKOPL 2022-06-18 07:27:35
11 ABNCKN Q0CHZ2GI1VKOPL 2022-06-18 07:27:35
12 KSPSLS QLSPDF0S0SDFKK 2022-06-19 07:27:35
14 AKNKSL QLSPDF0S0SDFKK 2022-06-19 07:27:35
Himanshu
  • 31,810
  • 31
  • 111
  • 133
  • ``` DELETE t1 FROM data t1 INNER JOIN data t2 WHERE t1.pid < t2.pid AND t1.serviceid = t2.serviceid AND t1.id = t2.id AND t1.created = t2.created; SELECT * FROM data; ``` this one i already tried, this delete all duplicates entries ref to newly one. i required only to delete yesterday's duplicate data, not all duplicate record. – SRash Jun 20 '22 at 05:21
  • By 'yesterday's duplicate date' do you mean to perform these actions (to find and remove duplicate data) only on yesterday's `created` date? If yes, then simply add that condition in `WHERE` clause. Like this: `WHERE CAST(created AS DATE) = DATE_ADD(CURRENT_DATE(), INTERVAL -1 DAY)`. See [this fiddle](https://dbfiddle.uk/?rdbms=mysql_8.0&fiddle=9accc793e3f4e7149bf45327554a67bd). – Himanshu Jun 20 '22 at 06:09
  • @SRash: Check the updated answer now. – Himanshu Jun 20 '22 at 06:31
  • @SRash: Did you see the updated answer? – Himanshu Jun 22 '22 at 13:21