Remove continuous duplicated values with different IDs in MySQL

Question

I know there is a ton of same questions about finding and removing duplicate values in mySQL but my question is a bit different:
I have a table with columns as ID, Timestamp and price. A script scrapes data from another webpage and saves it in the database every 10 seconds. Sometimes data ends up like this:

| id | timestamp | price |   
|----|-----------|-------|   
| 1  | 12:13     | 100   |   
| 2  | 12:14     | 120   |   
| 3  | 12:15     | 100   |   
| 4  | 12:16     | 100   |   
| 5  | 12:17     | 110   |

As you see there are 3 duplicated values and removing the price with ID = 4 will shrink the table without damaging data integrity. I need to remove continuous duplicated records except the first one (which has the lowest ID or Timestamp).
Is there a sufficient way to do it? (there is about a million records)
I edited my scraping script so it checks for duplicated price before adding it but I need to shrink and maintain my old data.

Jirka Picek · Answer 1 · 2020-11-24T07:36:13.490

My query is based on @Tim Biegeleisen one.

-- delete records
DELETE
FROM yourTable t1
-- where exists an older one with the same price
WHERE EXISTS (SELECT 1
              FROM yourTable t2
              WHERE t2.price = t1.price
                    AND t2.id < t1.id
-- but does not exists any between this and the older one
                    AND NOT EXISTS (SELECT 1
                                    FROM yourTable t3
                                    WHERE t1.price <> t3.price
                                          AND t3.id > t2.id
                                          AND t3 < t1.id));

It deletes records where exists an older one with same price but does not exists any different between

It could be checked by timestamp column if id column is not numeric and ascending.

Thanks for great answer but I'm having a hard time executing it. It gives me error 1064. — Mehdi Maazi, Nov 12 '20 at 11:08

score 1 · Answer 2 · answered Nov 12 '20 at 11:01

1

I am just grouping based on price and filtering only one record per group.The lowest id gets displayed.Hope the below helps.

 select id,timestamp,price from yourTable group by price having count(price)>0;

answered Nov 12 '20 at 11:01

aish.a

176
1
7

Slava Rozhnev · Accepted Answer · 2020-11-12T11:25:56.530

1

Since MySQL 8.0 you can use window function LAG() in next way:

delete tbl.* from tbl
join (
    -- use lag(price) for get value from previous row
    select id, lag(price) over (order by id) price from tbl
) l 
-- join rows with same previous price witch will be deleted
on tbl.id = l.id and tbl.price = l.price;

fiddle

edited Nov 12 '20 at 11:25

answered Nov 12 '20 at 11:07

Slava Rozhnev

9,510
6
23
39

1

Added some explanation – Slava Rozhnev Nov 12 '20 at 11:26

Remove continuous duplicated values with different IDs in MySQL

3 Answers3