How to delete repeated data on MYSQL table to be able to apply an unique value constraint

Question

I have a database with some fields and I want to apply a unique value constraint to a table:

ALTER TABLE assessment_submissions
ADD CONSTRAINT UC_Question UNIQUE (evaluated_user, evaluator_user, question_id);

But there is some data inside the table that doesn't allow me to put this constraint.

I got an error when I tried to apply the constraint:

SQL error 1062: Duplicate entry 154-154-45 for key UC_Question

Take a look at the image below:

The results on the line that starts with id 131271 and id 131413 have the same values on the fields evaluated_user, evaluator_user, and question_id.

This way it's not possible to apply the constraint.

I deleted the duplicated row, but I still was not able to apply the constraint.

I suppose there are more duplicate data inside that table. How can I find all data that is duplicated inside that table? Which query can I use to do that?

I have no idea where I can start.

Does this answer your question? [Remove duplicate rows in MySQL](https://stackoverflow.com/questions/3311903/remove-duplicate-rows-in-mysql) — Nico Haase, Jun 29 '21 at 07:15
Or this? https://stackoverflow.com/questions/2867530/how-to-remove-duplicate-entries-from-a-mysql-db — Nico Haase, Jun 29 '21 at 07:16

score 2 · Answer 1 · edited Jun 29 '21 at 07:07

Kindly use any of these two based on the need at hand. Essentially, you need to decide whether to retain the duplicate set that first hit the database or the set that arrived last.

DELETE t1 FROM assessment_submissions t1 INNER JOIN assessment_submissions t2 
WHERE t1.id < t2.id 
AND t1.evaluated_user=t2.evaluated_user AND t1.evaluator_user = t2.evaluator_user 
AND t1.question_id=t2.question_id;

DELETE t1 FROM assessment_submissions t1 INNER JOIN assessment_submissions t2 
WHERE t1.id > t2.id 
AND t1.evaluated_user=t2.evaluated_user AND t1.evaluator_user = t2.evaluator_user 
AND t1.question_id=t2.question_id;

I prefer the above approach for user environments that have MySQL versions earlier than 8.X

score 1 · Answer 2 · answered Jun 28 '21 at 18:45

This query will return all duplicate rows, along with a couple of IDs belonging to the same "group".

select evaluated_user, evaluator_user, question_id, count(*), min(id), max(id)
from assessment_submissions
group by evaluated_user, evaluator_user, question_id
having count(*) > 1

buddemat · Accepted Answer · 2021-06-29T17:55:03.387

It depends a bit on whether you want to examine the rows before deciding which ones to delete or not (see 1. respectively 2. below).

Let's say you have the following example data:

id  evaluated_user  evaluator_user  question_id   answer_id
1   262             275             157            573
2   262             275             162            593
3   262             275             332           1260
4   262             275             161            589
5   262             275             157            573
6   262             275             157           1425
7   262             275             167            726
8   262             275             167           4593

If you want to take a look at the rows to get the information which ones you need to delete, just grouping by wont get you the ids.

If you have MySQL 8.0, you can use a window function to calculate the number of duplicates for each unique (evaluated_user, evaluator_user, question_id) combination as follows (ordering is optional):

  select *, count(*) over (partition by evaluated_user, evaluator_user, question_id) as cnt
    from example e
order by cnt desc, evaluated_user, evaluator_user, question_id

This will give you

id   evaluated_user  evaluator_user  question_id   answer_id  cnt
1    262             275             157            573       3
5    262             275             157            573       3
6    262             275             157           1425       3
7    262             275             167            726       2
8    262             275             167           4593       2
4    262             275             161            589       1
2    262             275             162            593       1
3    262             275             332           1260       1

In this table, all entries with cnt > 1 are the rows you are interested in. If you just want them, wrap this into a select * from ... where cnt > 1.

For previous versions of MySQL (which don't support window functions), you can achieve the same using a query like the one from @DNNX's answer and joining the result with the original table:

select e.*
  from example e
  join (select evaluated_user, evaluator_user, question_id
          from example
      group by evaluated_user, evaluator_user, question_id
        having count(*) > 1) f
on e.evaluated_user = f.evaluated_user and
   e.evaluator_user = f.evaluator_user and
   e.question_id = f.question_id

Either of this will give you the rows you may want to examine to decide which ones to delete:

id   evaluated_user  evaluator_user  question_id   answer_id 
1    262             275             157            573      
5    262             275             157            573      
6    262             275             157           1425      
7    262             275             167            726      
8    262             275             167           4593

In case you don't need to inspect the data first to decide which records you delete, as long as one line remains, you can build your DELETE statement elegantly using the RANK() function (again, MySQL 8.0):
```
with subtab as (select id, rank() over (partition by evaluated_user, evaluator_user, question_id order by id) as rnk
                  from example)
delete from example e
where e.id in (select id 
                from subtab 
               where rnk > 1)
```
This example will bulk delete all duplicate rows except the one with the smallest id for each unique combination. You can modify the order by statement to influence which records to delete. For example, to keep the record with the highest id instead, you can order by id desc. Or if you wanted to keep the one with the smallest answer_id, you order by answer_id. Note: if you have duplicate entries in the column you are ordering by, you may end up with more than one row left. To avoid that, use row_number() instead of rank().

To get the same without window functions, you can use
```
delete e from example e 
  join example f 
    on e.evaluated_user = f.evaluated_user 
   and e.evaluator_user = f.evaluator_user 
   and e.question_id = f.question_id
   and e.id > f.id ;
```
Again, the statement can be modified depending on which row you want to keep. For example, to keep the one with the highest answer_id, you change the last condition to and e.answer_id < f.answer_id.

See this db<>fiddle for all of the above in action.

How to delete repeated data on MYSQL table to be able to apply an unique value constraint

3 Answers3