Renaming the duplicate data in sql

Question

I am able to find out the duplicate data using this query

   SELECT names FROM group GROUP BY names HAVING count(*) > 1

I am able to get the duplicate data.I just need to know how to rename this duplicate data with the name to new

  INPUT
+-----------------+               
| names           |
+-----------------+
| text1           |
| text2           |
| text3           |
| text1           |
| text3           |
| text4           |
+-----------------+

OUTPUT  
+-----------------+
| names           |
+-----------------+ 
| text1           |
| text2           |
| text3           |
| text1 new value |->RENAMED
| text3 new value |->RENAMED
| text4           |
+-----------------+

and what should happen if there are more than on duplicate ? adding new would reduce the duplicates by 1 and then still some duplicates (with an added new) would be in your data... — Yahia, Sep 14 '11 at 12:54
ya it would reduce the duplicates and rename to that name along with new — , Sep 14 '11 at 12:58

score 15 · Accepted Answer · edited Apr 11 '18 at 15:21

Assuming you have some sort of primary key on the table, like an auto increment id, you can do the following.

UPDATE group 
SET names = CONCAT(names,' Copy 1')
WHERE ID IN
(
SELECT MAX(ID) 
FROM group 
GROUP BY names 
HAVING count(*) > 1
);

To explain, it will find anything with a duplicate, pick up the Maximum ID for anything in that set, and append "copy 1" to the end of it. You may still have some left as duplicates if you had certain names 3 or more times. Just run it again , this time with 'copy 2' instead of 'copy 1'. Keep repeating this process until you get rid of all the duplicaates.

Update. To borrow an idea from @Yahia and use UUID, you can do the following if you want to do it all in one query.

UPDATE group 
SET names = CONCAT(names, CONCAT ( ' Copy ', UUID_SHORT() ) )
WHERE 
ID NOT IN
(
SELECT MIN(ID) 
FROM group 
GROUP BY names 
HAVING count(*) > 1
) AND
ID IN
(
SELECT ID
FROM group AS G1
INNER JOIN group AS G2
ON G1.names = G2.names AND G1.ID <> G2.ID
);

Also works for sql server, just replace UUID_SHORT() with NEWID(). Very nice solution! +1 — Alan Ball, Aug 23 '17 at 08:15
If you get Error 1093 - Can't specify target table for update in FROM clause see https://stackoverflow.com/a/45498/203915 for a workaround — Mark Robinson, Apr 11 '18 at 15:04
Also works for sqlite, just replace `CONCAT(names,' Copy 1')` with `names || " Copy 1"` — TBG, Aug 22 '22 at 12:26

Devart · Answer 2 · 2011-09-15T07:07:50.273

Try this one -

UPDATE table1 n 
  JOIN (SELECT names FROM table1 GROUP BY names HAVING count(*) > 1) d
    ON n.names = d.names
SET n.names = 'new value';

EDIT:

Full code -

CREATE TABLE table1(
  id INT(11) PRIMARY KEY AUTO_INCREMENT,
  names VARCHAR(255) DEFAULT NULL
);    

INSERT INTO table1 VALUES 
  ('text1'),
  ('text2'),
  ('text3'),
  ('text1'),
  ('text3'),
  ('text4');

UPDATE table1 n 
  JOIN (SELECT names, MIN(id) min_id FROM table1 GROUP BY names HAVING COUNT(*) > 1) d
    ON n.names = d.names AND n.id <> d.min_id
SET n.names = CONCAT(n.names, ' new value');

SELECT * FROM table1;

+----+-----------------+
| id | names           |
+----+-----------------+
|  1 | text1           |
|  2 | text2           |
|  3 | text3           |
|  4 | text1 new value |
|  5 | text3 new value |
|  6 | text4           |
+----+-----------------+

as shown in the output how can we just replace new value for one duplicate name — , Sep 15 '11 at 04:34
OK. In this case we should add unique field to identify records. I have changed my answer. — Devart, Sep 15 '11 at 07:10

Yahia · Answer 3 · 2011-09-14T15:29:57.603

2

use (corrected as per comment)

UPDATE Group 
SET Names = CONCAT(Names, CONCAT ( ' Copy ', UUID_SHORT() ) )
WHERE 
ID NOT IN
(
SELECT MIN(ID) 
FROM group 
GROUP BY names 
HAVING count(*) > 1
) AND
ID IN
(
SELECT ID
FROM group 
GROUP BY names 
HAVING count(*) > 1
);

this makes all duplicates unique with one execution by concatenating a unique UUID_SHORT... except the one with the smallest ID - it stays untouched...

edited Sep 14 '11 at 15:29

answered Sep 14 '11 at 13:02

Yahia

69,653
9
115
144

What about IDs that are in not in the Max(ID) set? This query is a lot like mine, in that it will be have to run multiple times, each time getting whatever the Max(ID) Is. The NOT IN part is actually redundant, because you are only pickup the up the MAX ids. – Kibbee Sep 14 '11 at 13:08
What about changing the second IN part to: AND names IN ( SELECT names FROM group GROUP BY names HAVING count(*) > 1 ); – H-Man2 Sep 14 '11 at 13:15
@H-Man2 that could work... not sure that this change would be enough – Yahia Sep 14 '11 at 13:58
Actually, that won't work, the since you are grouping by Name, it won't pick up all the IDs, but rather a random (probably first) id from each group of matching names. MySQL is unique in this regard in that it allows you to select columns, not aggregated, that you haven't included in the grouping. Most other databases don't allow this, as the resulting value isn't consistent when run multiple times across the same data. – Kibbee Sep 14 '11 at 14:11
@Kibee you are right - changing the IN as suggested by H-Man2 wouldn't work except one would include some "monster-join"... – Yahia Sep 14 '11 at 15:28

Renaming the duplicate data in sql

3 Answers3

Linked

Related