2

I have a table with results that come from an app. Some of the results are duplicate and I can see which "survey_question_id" has duplicate answers.

With this query I can see how many duplicates I have for each "survey_question_id":

    SELECT `id`,`survey_question_id`,
   `question`,`date`, COUNT(*) 
   FROM answers 
   GROUP BY `survey_question_id`, `date` 
   HAVING COUNT(*) > 1 and `date` > '2015-10-15'

Table Results:

enter image description here

How can I delete duplicates?

pmcmf
  • 23
  • 9

3 Answers3

2

You can

  1. create table your_table2 as select distinct * from original_table and then just
  2. drop original_table with some cascade constraints and
  3. alter your_table2, rename it and add back all constraints or just delete all data from original_table and do just insert into original_table select * from your_table2

I think that this is similar to your problem.

Community
  • 1
  • 1
Ivanka Eldé
  • 228
  • 1
  • 2
  • 12
1

For Duplicate records deletion, you can use this:

IF you want to keep LOWEST id than you can use this:

DELETE n1 FROM answers n1, answers n2 WHERE n1.id > n2.id 
AND n1.id <> n2.id
AND n1.survey_question_id = n2.survey_question_id 
AND n1.question = n2.question 
AND n1.date = n2.date

IF you want to keep HIGEST id than you can use this:

DELETE n1 FROM answers n1, answers n2 WHERE n1.id < n2.id 
AND n1.id <> n2.id
AND n1.survey_question_id = n2.survey_question_id 
AND n1.question = n2.question 
AND n1.date = n2.date

Add UNIQUE INDEX if you think this can be happened again in future.

ALTER IGNORE TABLE answers   
ADD UNIQUE INDEX (id,survey_question_id,
   question,date);

Side Note: please make sure you have a copy of this table before executing any kind of bulk Deletion.

devpro
  • 16,184
  • 3
  • 27
  • 38
0

Try this:

DELETE FROM answers
WHERE date > '2015-10-15'
AND id NOT IN 
  (SELECT id
      FROM (SELECT MIN(id) AS id  FROM answers
            WHERE date > '2015-10-15'
            GROUP BY survey_question_id) x
  )
Plebios
  • 835
  • 1
  • 7
  • 17
  • 1
    Its very risky query – devpro Dec 31 '15 at 16:54
  • This would delete a bunch of good data while leaving duplicates in the system. Based on the data in the screenshot, the 2 duplicates of ID 141450 would be left intact, and all other data (in the screenshot) would be destroyed. – EvilBob22 Dec 31 '15 at 16:56
  • Sorry but this one does not work. Thanks anyway – pmcmf Dec 31 '15 at 21:20