1

I have duplicate results like below where some column may have data and may not

| contact_info | icon | id  | title         | lastmodified_by  |
+--------------+------+-----+---------------+------------------+
|          169 |  305 | 123 | Whakarewarewa | 2011100400305262 |
|         NULL | NULL | 850 | Whakarewarewa | NULL             |
+--------------+------+-----+---------------+----------------



| contact_info | icon | id  | title         | lastmodified_by  |
+--------------+------+-----+---------------+------------------+
|         NULL | NULL | 123 | Paris         | NULL             |
|         NULL | NULL | 850 | Paris         | NULL             |
+--------------+------+-----+---------------+----------------

I want to delete record having less Data and if the all the Field values are exact same then delete any row. There are thousand records like this.

saun jean
  • 739
  • 3
  • 12
  • 25
  • It is obvious that there would be multiple records else i wouldn't have asked this question here on stackoverflow. – saun jean Jun 05 '12 at 08:29
  • This question was askwed trillions of time, [here](http://stackoverflow.com/questions/672702/how-to-delete-duplicates-in-mysql-table), [here](http://stackoverflow.com/questions/2469006/how-to-delete-duplicates-in-mysql-using-case), [here](http://stackoverflow.com/questions/2630440/how-to-delete-duplicates-on-mysql-table), [here](http://stackoverflow.com/questions/3383898/remove-duplicates-using-only-a-mysql-query), [here](http://stackoverflow.com/questions/6308924/modifying-a-mysql-query-that-removes-duplicates)... – lorenzo-s Jun 05 '12 at 08:44
  • This question is different from all those. Here there are 3 cases 1) Columns having exactly same data 2) One column is having data and other not . 3) Cannot delete all min ids or max ids because data can be either in max id or min id – saun jean Jun 05 '12 at 08:48
  • @lorenzo-s this links describe how to remove duplicates, but does not consider NULL values - 'record having less Data'. – Devart Jun 05 '12 at 08:49
  • @saunjean Ok, but at least you could have tried something starting from these answer, and then posting your code together with question. – lorenzo-s Jun 05 '12 at 08:49

3 Answers3

3

Try this two-step solution:

Run this query to vew all duplicates - record having less Data -

SELECT t1.* FROM table t1
  JOIN (
    SELECT
      title,
      MIN(IF(contact_info IS NULL, 0, 1) + IF(contact_info IS NULL, 0, 1) + IF(lastmodified_by IS NULL, 0, 1)) min_value_data,
      MAX(IF(contact_info IS NULL, 0, 1) + IF(contact_info IS NULL, 0, 1) + IF(lastmodified_by IS NULL, 0, 1)) max_value_data
    FROM table GROUP BY title HAVING min_value_data <> max_value_data
  ) t2
  ON t1.title = t2.title AND IF(t1.contact_info IS NULL, 0, 1) + IF(t1.contact_info IS NULL, 0, 1) + IF(t1.lastmodified_by IS NULL, 0, 1) <> t2.max_value_data

Rewrite it to DELETE statement and execute.


Then run this query to remove all duplicates except min ID:

DELETE t1 FROM table t1
  JOIN (SELECT MIN(id) id, title FROM table GROUP BY title) t2
    ON t1.id <> t2.id AND t1.title = t2.title;
Devart
  • 119,203
  • 23
  • 166
  • 186
  • What if any record with min id has no data but max id has data in it. It will delete id with max id and keep id with min id??? – saun jean Jun 05 '12 at 08:45
  • Yes, you are right. I read your edited question; and the query should be rewrited. – Devart Jun 05 '12 at 08:53
  • It is giving this "ERROR 1052 (23000): Column 'title' in field list is ambiguous" . Then i thought to put t1.title in starting of the query but then it gives "ERROR 1054 (42S22): Unknown column 't1.value_data' in 'on clause'" – saun jean Jun 05 '12 at 09:42
  • Add table alias (the field should be t1.title). – Devart Jun 05 '12 at 09:55
  • I Tried the same thing before posting the comment and the error is still the same "ERROR 1054 (42S22): Unknown column 't1.value_data' in 'on clause'" – saun jean Jun 05 '12 at 09:57
  • Similar alias error. There may be some errors because I did not execute these queries. I changed my answer. – Devart Jun 05 '12 at 10:32
1

Use this to select duplicates, feel free to alter this to a delete statement:

SELECT * FROM `test`,
(SELECT title, count( title ) AS ttl
FROM `test`
GROUP BY title
HAVING ttl >1) AS sub

WHERE test.title = sub.title
AND contact_info IS NULL AND lastmodified_by IS NULL 
Ramon Fincken
  • 253
  • 1
  • 5
  • 11
  • This query fails in the case Where title=Paris . It lists both the Paris in the Result. This is the case where both results are exact duplicates including all fields – saun jean Jun 05 '12 at 09:05
  • 1
    Correct, I did not see your Paris example. How about this: Showing rows 0 - 1 (2 total, Query took 0.0007 sec) SELECT * FROM `test` AS main, ( SELECT title, count( title ) AS ttl FROM `test` GROUP BY title HAVING ttl >1 ) AS sub WHERE main.title = sub.title AND main.contact_info IS NULL AND main.lastmodified_by IS NULL GROUP BY main.contact_info, main.icon, main.title, main.lastmodified_by – Ramon Fincken Jun 05 '12 at 09:16
  • This query is almost correct except the part that Table in which rows to be modified or deleted cannot be referenced in Subquery.so i cannot delete using this query – saun jean Jun 05 '12 at 09:50
  • See my new answer below. using more queries. – Ramon Fincken Jun 05 '12 at 11:11
0

Main table = tes1

Create temp

CREATE TEMPORARY TABLE my_temp ( id INT(20) NOT NULL ) ENGINE=MEMORY;

Fill with id's to remove

INSERT INTO my_temp (id) SELECT id FROM tes1 AS main, ( SELECT title, count( title ) AS ttl FROM tes1 GROUP BY title HAVING ttl >1 ) AS sub WHERE main.title = sub.title AND main.contact_info IS NULL AND main.lastmodified_by IS NULL GROUP BY main.contact_info, main.icon, main.title, main.lastmodified_by;

Delete!

DELETE FROM tes1 WHERE id IN (select id from my_temp);

Cleanup, note: do we really need this?

DROP TABLE my_temp;

Ramon Fincken
  • 253
  • 1
  • 5
  • 11