1

I have a database table with the columns: email, name, id

There are duplicate records with the same email and name (ids may or may not be different)

I want to remove all duplicates and not keep any of them.

Example

email            | name         | id
-------------------------------------------------
a@b.com          | a            | 2               --- remove
a@b.com          | a            | 4               --- remove
b@c.com          | b            | 3
c@d.com          | c            | 5

What sql query can I use in phpmyadmin to do this?

3 Answers3

2

You could use EXISTS:

DELETE FROM TableName t1
WHERE EXISTS
(
    SELECT 1 FROM TableName t2
    WHERE t1.id <> t2.id
      AND COALESCE(t1.email,'') = COALESCE(t2.email,'')
      AND COALESCE(t1.name,'')  = COALESCE(t2.name,'')     
)

I've used COALESCE to also delete duplicates if the emails or names are null.

Tim Schmelter
  • 450,073
  • 74
  • 686
  • 939
  • Hi, thank you. Trying to digest your answer now. Need to understand the meaning of <> and coalesce first. – user3509426 Jan 29 '15 at 12:01
  • @user3509426: `<>` simply means "not equal" –  Jan 29 '15 at 12:06
  • @user3509426; `<>` just means different(like `!=` in C# or other programming languages). `COALESCE` is a function which replaces `NULL` with something else, in this case an empty string. Otherwise you can't compare an email that is `NULL` with comparison oprators like `=` or `<>` because `NULL` means "undefined". Another way without `COALESCE` is to use `IS NULL`: `AND (t1.email IS NULL AND t2.email IS NULL) OR (t1.email = t2.email) ....` – Tim Schmelter Jan 29 '15 at 12:07
  • Hi, I have ran your code. There is a syntax error though. I have been trying to play around with the syntax but I cannot fix it. – user3509426 Jan 29 '15 at 13:53
  • @user3509426: what query have you used and what was the syntax error exactly? – Tim Schmelter Jan 29 '15 at 13:58
  • @TimSchmelter I ran `DELETE FROM Test t1 WHERE EXISTS ( SELECT 1 FROM Test t2 WHERE t1.id <> t2.id AND COALESCE(t1.email,'') = COALESCE(t2.email,'') AND COALESCE(t1.name,'') = COALESCE(t2.name,'') ); ` and the error is `#1064 - You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 't1 WHERE EXISTS ( SELECT 1 FROM Test t2 WHERE t1.id <> t2.id ' at line 1` – user3509426 Jan 29 '15 at 14:12
0

In MySQL, you should do this with a join:

delete t
   from example t join
        (select email, name
         from example
         group by email, name
         having count(*) > 1
        ) todelete
        on t.email = todelete.email and t.name = todelete.name;

Unfortunately, MySQL does not support simple subqueries on the table being modified in an update or delete statement.

EDIT:

Deleting a lot of records can be a performance bottleneck. To get around this, create a temporary table with the records you want, truncate the original table, and re-insert them.

So, try this:

create temporary table tempt as 
   select t.*
   from example t join
        (select email, name
         from example
         group by email, name
         having count(*) = 1
        ) tokeep
        on t.email = tokeep.email and t.name = tokeep.name;

truncate table example;

insert into example
    select * from tempt;

Try the select query first to be sure it has reasonable performance and does what you want.

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
  • Hi, I'm not sure if it's because my table is huge (180k records), but I've ran your code and got "Error in Processing Request Error code: 504 Error text: Gateway Time-out" Maybe I should duplicate the table with lesser records to test? – user3509426 Jan 29 '15 at 12:23
  • Unfortunately, the whole server seems to not be responding now. I doubt running 1 sql query can affect the whole server? And I have already closed all browser tabs. – user3509426 Jan 29 '15 at 12:39
  • Plesk would not load thus I had to SSH in to kill the sqld process. (1) I then tested using a table with 28 records. 14 are duplicates of the other 14. I ran your code and all records were wiped out, which is great. (2) I tested again by throwing in some unique records, and again your code worked fine since those unqiue ones were intact. (3) Now I just need to figure out how to run your code for my 180k records table. Would you have any advice for it? – user3509426 Jan 29 '15 at 14:14
  • Perfect, thank you - I have tested quite a number of times and each took less than 3mins to execute on my 180k records. I have found that there are values that hinder finding duplicates - example there is a block of space in front or behind a name/email. and these blocks are in different sizes. They are not the spacebar (that you hit on your keyboard) spaces. Would you know what keywords I have to use to google for solution? I found [link](http://stackoverflow.com/questions/7354171/why-is-my-string-value-accumulating-empty-spaces-when-inserting-into-sql-databas) which doesnt help me. – user3509426 Jan 30 '15 at 10:04
-1

DELETE n1 FROM tablename n1, tablename n2 WHERE n1.email = n2.email

vignesh
  • 97
  • 5