1

I know there is a ton of same questions about finding and removing duplicate values in mySQL but my question is a bit different:
I have a table with columns as ID, Timestamp and price. A script scrapes data from another webpage and saves it in the database every 10 seconds. Sometimes data ends up like this:

| id | timestamp | price |   
|----|-----------|-------|   
| 1  | 12:13     | 100   |   
| 2  | 12:14     | 120   |   
| 3  | 12:15     | 100   |   
| 4  | 12:16     | 100   |   
| 5  | 12:17     | 110   |   

As you see there are 3 duplicated values and removing the price with ID = 4 will shrink the table without damaging data integrity. I need to remove continuous duplicated records except the first one (which has the lowest ID or Timestamp).
Is there a sufficient way to do it? (there is about a million records)
I edited my scraping script so it checks for duplicated price before adding it but I need to shrink and maintain my old data.

Slava Rozhnev
  • 9,510
  • 6
  • 23
  • 39

3 Answers3

1

My query is based on @Tim Biegeleisen one.

-- delete records
DELETE
FROM yourTable t1
-- where exists an older one with the same price
WHERE EXISTS (SELECT 1
              FROM yourTable t2
              WHERE t2.price = t1.price
                    AND t2.id < t1.id
-- but does not exists any between this and the older one
                    AND NOT EXISTS (SELECT 1
                                    FROM yourTable t3
                                    WHERE t1.price <> t3.price
                                          AND t3.id > t2.id
                                          AND t3 < t1.id));

It deletes records where exists an older one with same price but does not exists any different between

It could be checked by timestamp column if id column is not numeric and ascending.

Jirka Picek
  • 589
  • 5
  • 19
1

I am just grouping based on price and filtering only one record per group.The lowest id gets displayed.Hope the below helps.

 select id,timestamp,price from yourTable group by price having count(price)>0;
aish.a
  • 176
  • 1
  • 7
1

Since MySQL 8.0 you can use window function LAG() in next way:

delete tbl.* from tbl
join (
    -- use lag(price) for get value from previous row
    select id, lag(price) over (order by id) price from tbl
) l 
-- join rows with same previous price witch will be deleted
on tbl.id = l.id and tbl.price = l.price;

fiddle

Slava Rozhnev
  • 9,510
  • 6
  • 23
  • 39