0

I want to alter an existing MySQL table to enforce a uniqueness constraint, but the data in the table currently is not unique.

Is there a way to modify only the set of duplicated data while leaving the rest unaffected? I don't want to delete the offending records as the rest of the data in the table is valuable.

Example:

> SELECT id, email from my_table;
+--------------------------------------+----------------+
| id                                   | email          |
+--------------------------------------+----------------+
| 070d5788-428e-4444-bd15-00b60f335bd6 | test@email.com |
| 0a81ae48-8900-4444-86da-d9418ea2ca00 | test@email.com |
| 0df1e66d-0053-4444-9774-c7daf5652d54 | test@email.com |
| 130fce57-178e-4444-b2cb-1c5c2c29fe07 | test@email.com |
...

SQL to run:

ALTER TABLE my_table ADD UNIQUE (email);
Milk
  • 2,469
  • 5
  • 31
  • 54
  • I'd recommend creating a new table (`my_table_new`) with this constraint and containing only unique data (something like `INSERT INTO my_table_new SELECT DISTINCT id, email FROM my_table` [though you'd likely want to define an order on this to ensure consistent results), then rename `my_table` to `my_table_old`, and then finally `my_table_new` to `my_table` so any previous query to `my_table` won't need to be updated. – WOUNDEDStevenJones Jun 29 '20 at 21:28
  • Also see [delete duplicates](https://stackoverflow.com/questions/4685173/delete-all-duplicate-rows-except-for-one-in-mysql), – danblack Jun 29 '20 at 23:52

2 Answers2

0

This is a good question. And not really easy to do. But, what you can do is the following:

  • Add a new column in the table with a default of "1".
  • Update the existing values to a sequential number for each email.
  • Then add a unique constraint.

This will allow you to keep your current data. And prevent duplicates on new data, including not allowing duplicates between new emails and existing emails.

So:

alter table my_table add email_counter int default 1;

update my_table t join
       (select t.*, row_number() over (partition by email order by id) as seqnum
        from my_table t
       ) tt
       on t.id = tt.id
    set email_counter = seqnum;

create unique index unq_my_table_email_email_counter on email(email_counter);

Then, when you insert a new email, the counter will have a value of "1" (don't reset it in the insert!). If the email is already in the table, you will have a unique index violation.

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
0

First, find the offending rows of data:

SELECT email, CONCAT("'", GROUP_CONCAT(id SEPARATOR "','"), "'")
FROM my_table
GROUP BY email
HAVING COUNT(email) > 1;

These are the records that need to be updated.

It looks like the id column is a UUID, which should be unique already. We can use the data in this column to seed the new unique email address. e.g. 070d5788-428e-4444-bd15-00b60f335bd6@email.com

Update the records using the following:

UPDATE my_table
SET email = CONCAT(id, '@email.com')
WHERE id IN (
  <ids_pasted_from_previous_output>
);

After this has been run, the data should be unique and the table can be altered to enforce the uniqueness constraint.

Milk
  • 2,469
  • 5
  • 31
  • 54