MySQL - remove duplicates after updating child records to point to single parent record

Question

I have a parent table 'policy' that contains multiple duplicate rows, in the sense that there are 2 fields which together constitutes a duplicate (i.e. policy_number and provider_id).

I also have a child table 'branch_policy' that contains child records for the duplicate parent rows (there is another child table of 'policy', but I'm hoping I could just apply the same solution to the 2nd child table as well).

Before I delete the duplicate parent rows, I want to re-parent the child records of the duplicate parent records to point one parent record, in order to safely delete the duplicate parent records.

E.g.:

policy table:

id  policy_number  provider_id  originating_branch
--------------------------------------------------
1   123            1            1
2   123            1            2

branch_policy table:

id  policy_id
--------------
1   1
2   2

I would like the 2nd record of branch_policy to be set to policy_id = 1, then delete policy record with id 2, so that I end with

policy table:

id  policy_number  provider_id  originating_branch
--------------------------------------------------
1   123            1            1

branch_policy table:

id  policy_id
--------------
1   1
2   1

Please note that it's not important which duplicate parent record is selected as the 'survivor'.

I hope that's clear!

http://stackoverflow.com/questions/4685173/delete-all-duplicate-rows-except-for-one-in-mysql — Michael Benjamin, Mar 21 '13 at 23:05

score 1 · Accepted Answer · answered Mar 21 '13 at 23:37

SQL Fiddle

MySQL 5.5.30 Schema Setup:

create table policy (
  id int not null, 
  policy_number varchar(100) not null,
  provider_id int not null, 
  originating_branch int not null
);

create table branch_policy(id int, policy_id int);

insert policy (id, policy_number, provider_id, originating_branch) 
  values 
    (1, '123', 1, 1),
    (2, '123', 1, 2),
    (3, '1234', 3, 3),
    (4, '1234', 3, 4)
;

insert branch_policy(id, policy_id) values 
  (1, 1), (2, 2), (3, 2),
  (4, 3), (5, 3), (6, 4)
;


update branch_policy bp
  join (select  
          po.id as old_id,
          pn.new_id
        from policy po
          join (
            select
              policy_number,
              provider_id,
              min(id) as new_id
            from policy
            group by policy_number, provider_id) as pn
          on po.policy_number = pn.policy_number
            and po.provider_id = pn.provider_id) as p
  on bp.policy_id = p.old_id
set bp.policy_id = p.new_id
;

delete po
from policy po
  join (
    select
      policy_number,
      provider_id,
      min(id) as new_id
    from policy
    group by policy_number, provider_id) as pn
  on po.policy_number = pn.policy_number
    and po.provider_id = pn.provider_id
where po.id <> pn.new_id

Query 1:

select * from policy

Results:

| ID | POLICY_NUMBER | PROVIDER_ID | ORIGINATING_BRANCH |
---------------------------------------------------------
|  1 |           123 |           1 |                  1 |
|  3 |          1234 |           3 |                  3 |

Query 2:

select * from branch_policy

Results 2:

| ID | POLICY_ID |
------------------
|  1 |         1 |
|  2 |         1 |
|  3 |         1 |
|  4 |         3 |
|  5 |         3 |
|  6 |         3 |

MySQL - remove duplicates after updating child records to point to single parent record

1 Answers1