2

I have a problem, actually a question on how would be the easiest way to delete all the duplicates in a MySQL table based on the first SELECT's name and timestamp.

So, I have this table:

SELECT * FROM search;
+------+--------+------------+--------------------------+-----------------------+
|  id  |  uid   |   string   |           cat            |         timestamp     | 
+------+--------+------------+--------------------------+-----------------------+
|39523 |   87   |eat to live |  a:1:{i:0;s:5:"toate";}  |  2013-10-07 17:01:41  |
|39524 |   87   |eat to live |  s:6:"author";           |  2013-10-07 17:01:41  |
|39525 |   87   |eat to live |  s:6:"people";           |  2013-10-07 17:01:41  |
|39526 |   87   |eat to live |  s:7:"company";          |  2013-10-07 17:01:41  |
|39527 |   87   |eat to live |  s:6:"author";           |  2013-10-07 17:01:42  |
|39532 |   31   |  friends   |  a:1:{i:0;s:5:"toate";}  |  2013-10-07 17:04:17  |
|39533 |   31   |  friends   |  s:6:"people";           |  2013-10-07 17:04:17  |
|39534 |   31   |  friends   |  s:7:"company";          |  2013-10-07 17:04:17  |
|39535 |   31   |  friends   |  s:6:"author";           |  2013-10-07 17:04:17  |
+------+--------+------------+--------------------------+-----------------------+

And I want to have left only the first unique appearance. My solution would be to Select all the rows, go through' them, and for each Row to delete all the rows that match the same string, the same uid and timestamp to be either the same timestamp or +1 second.

Any thoughts / suggestions ? I can do this in PHP and was wondering if there is an easier way to do this directly with an SQL query.

Thanks

Eek
  • 1,740
  • 1
  • 15
  • 24
  • have you seen this question: http://stackoverflow.com/questions/2630440/how-to-delete-duplicates-on-mysql-table – tommyd456 Oct 07 '13 at 16:26
  • I do not to delete all duplicates. There may be another 100 persons that have searched for "friends" I only want to delete the next 3 rows that are not more distant than 1 second apart from the first. I can also have only 1 of the rows in that table. – Eek Oct 07 '13 at 16:38

2 Answers2

3

MySQL supports multi-table DELETE, so you can do a self-join such that only rows with a greater id are included in the deletion.

DELETE s2
FROM search AS s1 JOIN search AS s2 
  ON s1.string = s2.string AND s1.uid = s2.uid AND s1.id < s2.id;

If you want to add some logic to the join so that it only deletes if the matching s2 rows are not more distant than 1 second apart from the first, you can do that too:

DELETE s2
FROM search AS s1 JOIN search AS s2 
  ON s1.string = s2.string AND s1.uid = s2.uid AND s1.id < s2.id;
    AND s2.timestamp <= s1.timestamp+1

But if you have many entries in a row that are each 1 second apart, this will delete all but the very first. I'm not sure what you want to have done in that case.

Bill Karwin
  • 538,548
  • 86
  • 673
  • 828
  • Worked flawlessly :) Altho' you have a mistype on your second SQL, move `;` one row down :D The maximum duplicate entries were 4 and none of them were more than 1 second apart. Keeping the first was what I wanted :) Worked like charm, and in only 8 seconds. Wow. :D Thanks! – Eek Oct 07 '13 at 17:05
  • Oh damn. On the bigger table I get `#1205 - Lock wait timeout exceeded; try restarting transaction` :(( – Eek Oct 07 '13 at 17:15
  • 1
    Sounds like someone else is currently executing a long-running change and holding a lock on part of the table. Note that doing this DELETE without applying it to a subset of the table is effectively going to lock every row in the table. You might want to do it in batches, against a subset of uid's. – Bill Karwin Oct 07 '13 at 17:24
  • Yup, that's what happened. I fixed it :D Thanks! :) – Eek Oct 08 '13 at 07:55
0

You can delete using this code, based on a duplicate is the one which have the same uid and string; this code will left only the first appearance.

DELETE FROM search 
WHERE id NOT IN 
(SELECT MIN(id) FROM search GROUP BY uid, string);
Lolito
  • 413
  • 3
  • 9