3

I need to delete all the duplicated records from one of my tables the problem is that there isn't any id or unique or key column so I can't make something like this:

delete from tbl using tbl,tbl t2 where tbl.locationID=t2.locationID
  and tbl.linkID=t2.linkID  and tbl.ID>t2.ID

because it needs an id column or unique or key column and I can't make an

ALTER IGNORE TABLE 'mytable' ADD UNIQUE INDEX 

because there is information that will be always necessary duplicated but others don't and I can't make this:

DELETE FROM 'table' WHERE 'field' IN (SELECT 'field' FROM 'table' GROUP BY 'field'HAVING (COUNT('field')>1))

because it will delete all the duplicated and never will leave one this is an example of my table


+----------+----------------------+-------------+-------------+
| phone    | address              | name        | cellphone   |
+----------+----------------------+-------------+-------------+
| 2555555  | 1020 PANORAMA        | JUAN CARLOS | 0999999999  | diferent address
| 2555555  | GABRIEL JOSE 1020    | JUAN CARLOS | 0999999999  | good one
| 2555555  | GABRIEL JOSE 1020    | JUAN CARLOS | 0999999999  | duplicated
| 2555555  | C ATARAZANA 1020     | SILVIA      | 0777777777  | another good one
| 2555555  | C ATARAZANA 1020     | SILVIA      | 0777777777  | another duplicated
| 2555555  | GABRIEL JOSE 1020    | VIOLETA     | 0888888888  | diferent person
+----------+----------------------+-------------+-------------+

and this is what I want to leave


+----------+----------------------+--------------+-------------+
| phone    | address              | name         | cellphone   |
+----------+----------------------+--------------+-------------+
| 2555555  | 1020 PANORAMA        | JUAN CARLOS  | 0999999999  |
| 2555555  | GABRIEL JOSE 1020    | JUAN CARLOS  | 0999999999  |
| 2555555  | C ATARAZANA 1020     | SILVIA       | 0777777777  |
| 2555555  | GABRIEL JOSE 1020    | VIOLETA      | 0888888888  |
+----------+----------------------+--------------+-------------+

and I can't truncate or delete the original table because its used 24/7 and has 10000000 records....

Please help me.

Brian Tompsett - 汤莱恩
  • 5,753
  • 72
  • 57
  • 129
ss4sgoku
  • 57
  • 1
  • 5
  • possible duplicate of [Remove Duplicate Rows Leaving Oldest Row Only?](http://stackoverflow.com/questions/3671607/remove-duplicate-rows-leaving-oldest-row-only) – Ashesh Kumar Oct 12 '14 at 02:39
  • nop it doesnt work with it ... that example works with tables that have something like an id column or key column or unique column mine dont have it and cant change it – ss4sgoku Oct 12 '14 at 02:47

4 Answers4

7

Adding a unique index (with all the columns of the table) with ALTER IGNORE will get rid of the duplicates:

ALTER IGNORE TABLE table_name
  ADD UNIQUE INDEX all_columns_uq
    (phone, address, name, cellphone) ;

Tested in SQL-Fiddle.

Note: In version 5.5 (due to a bug in the implementation of fast index creation), the above will work only if you provide this setting before the ALTER:

SET SESSION old_alter_table=1 ;
Andriy M
  • 76,112
  • 17
  • 94
  • 154
ypercubeᵀᴹ
  • 113,259
  • 19
  • 174
  • 235
  • that cant be do.... there will be always exists some those columns with duplicated info for example in the same telephone number (house) will be 3 persons.... so i cant make unique the phone number, the address cant be unique because if its a building there can live many families with diferents phones... so they it cant be used.... :( – ss4sgoku Oct 12 '14 at 04:15
  • +1 from me too.. I tried your solution but didnt seem to get it working.. I like the approach though [**ATTEMPT**](http://sqlfiddle.com/#!2/dd86a/1) – John Ruddell Oct 12 '14 at 17:34
  • @JohnRuddell It works in version 5.1 but not in 5.5 in SQLFiddle. Probably has to do with some MySQL setting. `IGNORE` is under deprecation in 5.6 and removed from 5.7 – ypercubeᵀᴹ Oct 12 '14 at 18:31
  • No, if you remove `IGNORE`, the duplicates will produce an error for sure. See the linked sqlfiddle in my answrer, where it works, with IGNORE. – ypercubeᵀᴹ Oct 12 '14 at 19:06
  • Yea ive never used ignore but i was assuming thats what it did.. I like this approach as its pretty simplistic.. And its not adding a primary key just an index imo this should be the accepted answer! – John Ruddell Oct 12 '14 at 21:24
1

its pretty simple just make a temporary table and drop the other table then recreate it

CREATE TEMPORARY TABLE IF NOT EXISTS no_dupes AS 
(SELECT * FROM test GROUP BY phone, address, name, cellphone);

TRUNCATE table test;
INSERT INTO test (phone, address, name, cellphone) 
SELECT phone, address, name, cell FROM no_dupes;

WORKING DEMO

John Ruddell
  • 25,283
  • 6
  • 57
  • 86
  • that can work if it use a select distinct but i cant delete the original table.... its a good options but not in my case.... :( – ss4sgoku Oct 12 '14 at 04:18
  • @ss4sgoku check my edit... try truncate which will completely empty the table but not delete it – John Ruddell Oct 12 '14 at 04:33
  • @ss4sgoku also you dont need distinct because the group by handles that – John Ruddell Oct 12 '14 at 04:46
  • yes its a good method but i cant truncate or delete de original one.... and it have 10000000 rows so it will take a lil time and processing power to remake etc... thats why im still searching for a way.... – ss4sgoku Oct 12 '14 at 05:50
  • @ss4sgoku however you do it, it will take some time, if the table is large. 10M rows is not that big, by the way. – ypercubeᵀᴹ Oct 12 '14 at 11:21
  • @ss4sgoku truncate is just emptying the table... you could also just delete all the rows if you want... can you not do that? I thought you had deleting privileges... the logic is save everything you want to keep in a temporary table and then delete everything from the master table then insert into the master table the stuff from the temp table... I don't know why you can't do this – John Ruddell Oct 12 '14 at 13:53
  • +1 for this answer. I'd guess they need `GROUP BY **phone,** address, name, cellphone` instead but still, the essence is the same. – ypercubeᵀᴹ Oct 12 '14 at 16:36
  • @ypercube thanks, i was assuming that the people living in each house with a different phone wouldnt have the same address name and cellphone number since those are generally unique but i added it anyways just to be safe :) – John Ruddell Oct 12 '14 at 17:28
0

I'd use sub query. Something like:

DELETE FROM table1
WHERE EXISTS (
SELECT field1 
FROM table1 AS subTable1 
WHERE table1.field1 = subTable1.field1 and table1.field2 = subTable1.field2)

Haven't try this out though.

John Ruddell
  • 25,283
  • 6
  • 57
  • 86
Brain Balaka
  • 123
  • 1
  • 4
  • Ah my bad.. What if we put the result of the sub query into some temporary table (after we use distinct), delete all the duplicated records, then add the record from our temporary table to the original one.. This might not seem straight forward though, but the only way I could think of. Would love to see another answer:) – Brain Balaka Oct 12 '14 at 04:24
  • thats a good one .... i never try something like that but it can work, i create temporary delete all duplicated from the original one and insert from temporary its a good solution – ss4sgoku Oct 12 '14 at 06:06
-2

there is always a PK per table but you can combine columns as an unique id, so it's possible use a full row as a unique id if you want to... but I don't recommend use a full row, you should search what are the most significant columns that you can use a PK, when you have done that, you can copy the data, if there is no problem the mysql won't copy the duplicate rows.

sorry for my bad english

Lord_Gama
  • 287
  • 2
  • 4
  • 12
  • there must be always a pk.... but this one dont have and i cant alter it... so i dont have a normal way to delete the duplicated rows... thats way im having a headche.... i can find all the duplicated and some records are repeated 900 times others 400 and others only 3 or 2 times but i need to leave only one – ss4sgoku Oct 12 '14 at 02:56