Delete duplicates from db

Question

I have table like following

id | a_id | b_id | success
--------------------------
1    34      43      1
2    34      84      1
3    34      43      0
4    65      43      1
5    65      84      1
6    93      23      0
7    93      23      0

I want delete duplicates with same a_id and b_id, but I want keep one record. If possible kept record should be with success=1. So in example table third and sixth/seventh record should be deleted. How to do this?

I'm using MySQL 5.1

check this link http://www.besttechtools.com/articles/article/sql-query-to-delete-duplicate-rows — Siva, Sep 07 '14 at 07:52
But my case is more complicated and I don't know how to keep one record with succes = 1 (if possible) — andrzej1_1, Sep 07 '14 at 08:05
if say two records are duplicates with success = 0, can u delete both, or need one. — BatScream, Sep 07 '14 at 08:33

BatScream · Accepted Answer · 2015-06-25T18:17:08.847

The task is simple:

Find the minimum number of records that should not be deleted.
Delete the other records.

The Oracle way,

delete from sample_table where id not in(
select id from
(
 Select id, success,row_number()  
 over (partition by a_id,b_id order by success desc) rown 
 from sample_table
) 
where (success = 1 and rown = 1) or rown=1)

The solution in mysql:

Will give you the minimum ids that should not be deleted.:

Select id from (SELECT * FROM report ORDER BY success desc) t 
group by  t.a_id, t.b

o/p:

You can delete the other rows.

  delete from report where id not in (the above query)

The consolidated DML:

delete from report 
  where id not in (Select id 
                    from (SELECT * FROM report 
                          ORDER BY success desc) t 
                    group by  t.a_id, t.b_id)

Now doing a Select on report:

ID  A_ID    B_ID    SUCCESS
1   34       43     1
2   34       84     1
4   65       43     1
5   65       84     1
6   93       23     0

You can check the documentation of how the group by clause works when no aggregation function is provided:

When using this feature, all rows in each group should have the same values for the columns that are omitted from the GROUP BY part. The server is free to return any value from the group, so the results are indeterminate unless all values are the same.

So just performing an order by 'success before the group by would allow us to get the first duplicate row with success = 1.

oops - the solution is in oracle sql, wait a minute let me transform the logic to mysql. — BatScream, Sep 07 '14 at 08:54

CRThaze · Answer 2 · 2014-09-07T08:20:00.910

1

How about this:

CREATE TABLE new_table 
  AS (SELECT * FROM old_table WHERE 1 AND success = 1 GROUP BY a_id,b_id);
DROP TABLE old_table;
RENAME TABLE new_table TO old_table;

This method will create a new table with a temporary name, and copy all the deduped rows which have success = 1 from the old table. The old table is then dropped and the new table is renamed to the name of the old table.

If I understand your question correctly, this is probably the simplest solution. (though I don't know if it's really efficient or not)

edited Sep 07 '14 at 08:20

answered Sep 07 '14 at 08:14

CRThaze

544
3
16

So what if any of duplicates has not succes=1? – andrzej1_1 Sep 07 '14 at 08:32
Well your question said that those should also be removed, and this would do that. This preserves rows 1, 2, 4, & 5 – CRThaze Sep 07 '14 at 08:33
'Sixth/seventh' mean sixth OR seventh – andrzej1_1 Sep 07 '14 at 08:39
well removing the `AND success = 1` would give you that result for this example data, however it also removes the guarantee that you will favor the success=1 rows on other data. – CRThaze Sep 07 '14 at 08:42

Jens Krogsboell · Answer 3 · 2014-09-07T09:00:51.093

This should work:

If procedural programming is available to you like e.g. pl/sql it is fairly simple. If you on the other hand is looking for a clean SQL solution it might be possible but not very "nice". Below is an example in pl/sql:

begin
  for x in ( select a_id, b_id
             from   table
             having count(*) > 1
             group  by a_id, b_id )
  loop
    for y in ( select *
               from   table
               where  a_id = x.a_id
               and    b_id = x.b_id
               order  by success desc )
    loop
      delete from table
      where  a_id = y.a_id
      and    b_id = y.b_id
      and    id != x.id;
      exit; // Only do the first row
    end loop;
  end loop;
end;

This is the idea: For each duplicated combination of a_id and b_id select all the instances ordered so that any with success=1 is up first. Delete all of that combination except the first - being the successful one if any.

or perhaps:

declare
  l_a_id integer := -1;
  l_b_id integer := -1;
begin
  for x in ( select *
             from   table
             order  by a_id, b_id, success desc )
  loop
    if  x.a_id = l_a_id and x.b_id = l_b_id
    then
      delete from table where  id = x.id;
    end if;
    l_a_id := x.a_id;
    l_b_id := x.b_id;
  end loop;
end;

I'm sorry - I am not into procedural MySQL. Maybe someone else can kick in? — Jens Krogsboell, Sep 07 '14 at 08:39

score 0 · Answer 4 · edited May 23 '17 at 12:14

0

In MySQL, if you dont want to care about which record is maintained, a single alter table will work.

ALTER IGNORE TABLE tbl_name
ADD UNIQUE INDEX(a_id, b_id)

It ignores the duplicate records and maintain only the unique records.

A useful links : MySQL: ALTER IGNORE TABLE ADD UNIQUE, what will be truncated?

edited May 23 '17 at 12:14

Community

1
1

answered Jun 25 '15 at 18:21

georgecj11

1,600
15
22

Delete duplicates from db

4 Answers4