1

I read many posts that teach how to remove duplicate row. But in my situation, there is relationship between tables. I would like to update the relationship before deleting the duplicate rows.

I also read a question in 2013 that is very similar to my situation but I didn't archive it using the method in the posts. deleting duplicates in sql and modifying relationship table accordingly

I am new to mysql and need some help. Thank you in advance. I build this with laravel 8 and mysql 8 and I manage the tables in phpmyadmin.

Here is my tables:

actors
id       actor_id
1        fghj
2        fghj        (to be removed)
3        fghj        (to be removed)
4        zxcv
5        zxcv        (to be removed)

videos
id       code
1        aaa
2        bbb
3        ccc

actor_video
actor_id()     video_id
(=actors.id)   (=videos.id)
1            1
1            3        (to be modified to 1 1)
2            4
2            5        (to be modified to 2 4)
3            2        (to be modified to 3 1)
3            3        (to be modified to 3 1)

I have an idea but I failed to code it. My plan is:

  1. create a column in actors correct_id
  2. find all duplicate rows in actors with COUNT(actor_id)
  3. assign correct_id = min(id) in each duplicate group
  4. modify the relationship table by replacing actor_video.actor_id to actors.correct_id
  5. remove duplicate rows in actor_video
  6. remove duplicate rows in actors
Step 1:
ALTER TABLE actors
ADD correct_id varchar(255);
Step2:
SELECT actor_id,COUNT(actor_id)
FROM `actors` 
GROUP BY actor_id
HAVING COUNT(actor_id) >1
Step3:
????
(this is wrong code. Just try to express my idea)
UPDATE actors
SET correct_id = min(id)
WHERE COUNT(actor_id) >1

Then I stuck since step 3. Sorry if this a slow and stupid idea. Please correct me for a efficient way. Thank you

Philip
  • 63
  • 1
  • 7

2 Answers2

1

Update junction table

UPDATE actor_video av
JOIN actors a1 ON a1.id = av.actor_id
JOIN ( SELECT MIN(id) id, actor_id
       FROM actors
       GROUP BY actor_id ) a2 ON a1.actor_id = a2.actor_id
SET av.actor_id = a2.id;

then delete excess rows

DELETE a1
FROM actors a1
JOIN actors a2 ON a1.actor_id = a2.actor_id
WHERE a1.id > a2.id;

Or use more safe deletion:

DELETE
FROM actors
WHERE NOT EXISTS ( SELECT NULL
                   FROM actor_video 
                   WHERE actors.id = actor_video .actor_id );
Akina
  • 39,301
  • 5
  • 14
  • 25
  • Thank you Akina. The query works very well. And also, it doesn't require us to create an additional column "correct_id". Really learn the mechanism how to update a table corresponding to values in another table. All I need to do next is to remove the duplicate relationship. Thank you again! – Philip May 20 '21 at 22:13
1

You can join the minimal id

You can use this update method to directly change the ids from the bridge table, with the correct id.

and here is a canonical answer how to remove dupes

CREATE TABLE actors (
  `id` INTEGER,
  `actor_id` VARCHAR(4),
  `correct_id` INTEGER
);

INSERT INTO actors
  (`id`, `actor_id`, `correct_id`)
VALUES
  ('1', 'fghj', '0'),
  ('2', 'fghj', '0'),
  ('3', 'fghj', '0'),
  ('4', 'zxcv', '0'),
  ('5', 'zxcv', '0');
SELECT actor_id,MIN(id)
FROM `actors` 
GROUP BY actor_id
actor_id | MIN(id)
:------- | ------:
fghj     |       1
zxcv     |       4
UPDATE 
actors a1 INNER JOIN (SELECT actor_id,MIN(id) minid
FROM `actors` 
GROUP BY actor_id) a2 USING (actor_id)
SET correct_id = minid
SELECT * FROM actors
id | actor_id | correct_id
-: | :------- | ---------:
 1 | fghj     |          1
 2 | fghj     |          1
 3 | fghj     |          1
 4 | zxcv     |          4
 5 | zxcv     |          4

db<>fiddle here

nbk
  • 45,398
  • 8
  • 30
  • 47
  • Thank you guys. I tried both methods from Akina and nbk. Both methods works fine for my condition. I will choose this as the best answer as it seems more fit to the question. I would like to emphasize again that both answers worked very well! – Philip May 20 '21 at 22:11
  • upvote the other answer, if it was helpful too – nbk May 20 '21 at 22:13