0

We're going through a bit of a clean-up exercise and I need to remove duplicate data that has accidentally been added to our database table. The ID is obviously different, but other fields are the same.

I can use the following query to select the duplicate data sets:

SELECT user_id, start_datetime, count(id) AS dup_count
FROM our_table
WHERE status = 1
GROUP BY user_id, start_datetime
HAVING count(id) > 1;

What I need to do is create a query that would take each of the duplicate IDs APART FROM THE FIRST and use that to update the status to 0.

I'm not sure I can do this is one query, but I think the steps are as follows:

  1. Run a query similar to the one above
  2. Extract all the IDs for the duplicate sets
  3. Ignore the first in the list as we don't want to alter the correctly added first record
  4. Run the update on the remaining set of IDs

Am I out of luck here - or is it possible to do?

Many thanks!

  • Possible? It's dead easy, but I suggest you post some sample data for others to mess about with. Show a BEFORE and AFTER – Strawberry Nov 26 '14 at 20:56
  • 1
    http://stackoverflow.com/questions/18932/how-can-i-remove-duplicate-rows same idea instead of doing a delete just do an update – thermite Nov 26 '14 at 20:57
  • I'd do it with update ... Where id in (your select here) – eduyayo Nov 26 '14 at 20:58

2 Answers2

0

You can do this with an update/join:

UPDATE our_table ot JOIN
       (SELECT user_id, start_datetime, count(id) AS dup_count, min(id) as minid
        FROM our_table
        WHERE status = 1
        GROUP BY user_id, start_datetime
        HAVING count(id) > 1
       ) dups
       ON ot.user_id = dups.user_id and
          ot.start_datetime = dups.start_datetime and
          ot.id > dups.minid
    SET ot.status = 0;
Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
  • Thanks Gordon. This looks like it'll work - but I get a lock error when running it which I need to look into. I did try tweaking it to a select to see all the results I needed to update so fairly confident it's what I need. As soon as I get it working I'll come back and mark it as answered. Many thanks again! – Kevin Davies Nov 27 '14 at 08:41
  • @KevinDavies . . . This query per se should not be causing the lock error. MySQL does support `update`/`join`, even when the same table is mentioned multiple times. – Gordon Linoff Nov 28 '14 at 15:13
  • Thanks Gordon. Problem has been solved another way, but based on the select happy that this is the solution. Many thanks. – Kevin Davies Nov 30 '14 at 15:58
0

You can use this update query that will join OUR_TABLE with itself:

UPDATE
  our_table o1 INNER JOIN our_table o2
  ON o1.status=1
     AND o2.status=1
     AND o1.user_id = o2.user_id
     AND o1.start_datetime = o2.start_datetime
     AND o1.ID > o2.ID
SET
  o1.status = 0

Please see an example fiddle here.

fthiella
  • 48,073
  • 15
  • 90
  • 106