1

Given the following table:

+----+--------+--------+------------+
| id | val1   | val2   | timestamp  |
+----+--------+--------+------------+
| 1  | 50     | 100    | 1471869500 |
| 2  | 50     | 100    | 1471869800 |
| 3  | 60     | 70     | 1471864400 |
| 4  | 60     | 80     | 1471863300 |
| 5  | 60     | 90     | 1471862200 |
| 6  | 60     | 100    | 1471861100 |
+----+--------+--------+------------+

I want to remove any rows that have duplicate values in columns val1 and val2 (so that would be rows 1 and 2) but keep the row that was inserted last (row 2). In the example table only row 1 would be removed.

Hoe do I create an SQLite statement that does this for me?

I have tried:

Select statement to find duplicates on certain fields

Delete all Duplicate Rows except for One in MySQL?

https://dba.stackexchange.com/questions/101697/sqlite3-remove-duplicates-by-multiple-columns

Cheers

Community
  • 1
  • 1
Gijs de Jong
  • 967
  • 9
  • 14

2 Answers2

1

It appears that SQLite does not support direct joining in its DELETE syntax, but you can still use a correlated subquery to get the job done. Try this query:

DELETE
FROM yourTable
WHERE timestamp t <> (SELECT MAX(p.timestamp) FROM yourTable p
                      WHERE p.val1 = t.val1 AND p.val2 = t.val2)
Tim Biegeleisen
  • 502,043
  • 27
  • 286
  • 360
  • I should have changed the timestamps for the other rows more I see. When the timestamps are different this query does not work properly. It did work great for the example though :) I will update the example table in the original question. Please see sqlfiddle.com/#!7/22503/1 – Gijs de Jong Aug 22 '16 at 15:23
  • @GijsdeJong You are misinterpreting the SQL Fiddle. When you apply my query logic to a `SELECT`, you will return the records which you _do_ want to delete. For most of your data, there is only one record per (`val1`, `val2`) group. In this case, no records will be selected out. During a delete, the opposite will happen, and the records selected out will be deleted. – Tim Biegeleisen Aug 22 '16 at 15:45
  • True, but could you think of a solution where the rows would be deleted by id? The actual data I am working with contains a lot more rows and I can not be sure that deletion by timestamp will never delete more rows than I would like. – Gijs de Jong Aug 23 '16 at 08:14
  • Your requirement necessitates using the timestamp in the query. If you want to restrict deletion to a range of ids, then just add a `WHERE` clause. – Tim Biegeleisen Aug 23 '16 at 08:17
  • 1
    Thanks for you help :) – Gijs de Jong Aug 23 '16 at 09:28
  • @GijsdeJong Not a problem, any time :-) – Tim Biegeleisen Aug 23 '16 at 09:28
0

If I run the fiddle most of the data set is wiped out and only id=5 remains. So given:

I want to remove any rows that have duplicate values in columns val1 and val2 (so that would be rows 1 and 2) but keep the row that was inserted last (row 2). In the example table only row 1 would be removed.

This should do the trick and remove row 1 only:

SELECT * FROM queue
WHERE rowid NOT IN (
  SELECT MAX(rowid) 
  FROM queue 
  GROUP BY val1, val2
);

http://sqlfiddle.com/#!7/6cc1d/1