Deleting duplicate rows on MySQL (leaving at least one)

Question

I have this table in MySQL:

id_word
lang_original (the language from the original word) VARCHAR(2)
lang_target (the language from the translated word) VARCHAR(2)
word (the word itself) VARCHAR(50)
translation (the translation) VARCHAR(50)

They should not have duplicates. Is it possible to have a sql query that finds duplicates and deletes them (leaving the first match undeleted)?

Update a duplicate would be something that has the same lang_original,lang_target and word (only those 3 fields).

possible duplicate of [SQL - How can I remove duplicate rows?](http://stackoverflow.com/questions/18932/sql-how-can-i-remove-duplicate-rows) and many, many others. — Andriy M, Dec 08 '11 at 08:52

Erwin Brandstetter · Answer 1 · 2011-12-08T04:22:19.800

1

Could work like this:

DELETE FROM tbl
WHERE  EXISTS (
   SELECT *
   FROM   tbl t
   WHERE (t.lang_original,   t.lang_target,   t.word)
       = (tbl.lang_original, tbl.lang_target, tbl.word)
   AND tbl.id_word > t.id_word
   )

If @Jason is right, and MySQL does not allow to reference the delete table, here is another form that works independently:

DELETE FROM tbl
USING (
   SELECT min(id_word) AS min_id, lang_original, lang_target, word
   FROM   tbl t
   GROUP  BY lang_original, lang_target, word
   HAVING count(*) > 1
   ) x
WHERE (tbl.lang_original, tbl.lang_target, tbl.word)
   =  (  x.lang_original,   x.lang_target,   x.word)
AND    tbl.id_word > x.min_id

Both variants leave the duplicate with the smallest id alive and kill the rest.

If you want to save all your translations to the word with the smallest id in a group of dupes first:

UPDATE tbl SET translation = all_trans
FROM  (
   SELECT min(id_word) AS min_id, group_concat(translation) AS all_trans
   FROM   tbl
   GROUP  BY lang_original, lang_target, word
   HAVING count(*) > 1
   ) t
WHERE  tbl.id_word = t.min_id

edited Dec 08 '11 at 04:22

answered Dec 08 '11 at 03:45

Erwin Brandstetter

605,456
145
1,078
1,228

Please define "duplicate" and tell us about the primary key of the table. – Erwin Brandstetter Dec 08 '11 at 03:46
Does this fall victim to the *can't reference the update table* constraint? – Jason McCreary Dec 08 '11 at 03:47
@JasonMcCreary: Not sure. You mention "update", but it's a delete. Anyway, thanks for the hint. I provided an alternative version, just in case. – Erwin Brandstetter Dec 08 '11 at 04:02
sorry, actually a duplicate would be something that has the same lang,target and word (only those 3 fields). – Andres SK Dec 08 '11 at 04:06
Which of the dupes would you want to keep? The one with the smallest id? The one with the longest word in `translation`? Or write a comma-separated list of all translations to the word with the smallest id first? – Erwin Brandstetter Dec 08 '11 at 04:08

vik · Accepted Answer · 2011-12-12T17:56:07.903

It's simpler to create a new table. Previous answers are good, but I like it this way: Create a new table with a unique key for "lang_original, lang_target, word"

CREATE TABLE new_table_can_be_renamed_later ( 
  ..your-fields...
  UNIQUE unique (lang_original,lang_target,workd)
);

Then fill your new table with by selecting the old table and use IGNORE in your INSERT

INSERT IGNORE INTO new_table_can_be_renamed_later 
  SELECT * FROM original_table

Please consider Mysql docs for right syntax.

score 0 · Answer 3 · answered Dec 08 '11 at 03:45

0

I'm not sure that you can do that. You are probably better off doing something like

select distinct * into yournewtable from originaltable

That may work.

answered Dec 08 '11 at 03:45

Johnny Rocket

1,394
3
17
25

it would be to heavy, there are hundreds of thousands of words. – Andres SK Dec 08 '11 at 04:07

Deleting duplicate rows on MySQL (leaving at least one)

3 Answers3