0

I have a table “table-A” with duplicate records like (duplicates based on “name” column) "`

ID  Name        Course
01  abc     dotnet
02  xyz     java
03  mno     sas
04  abc     dotnet
05  xyz     java
06  abc     dotnet
07  aaa     testing
08  bbb     sap

--- “abc” came 3 times (id—1,4,6)
--- “xyz” came 2 times (id – 2,5)

`" From above table I need to delete duplicates (with ID -1, 4, 2) — not IDs 6, 5 those are latest files. “table-A” should contain only--

 ID Name        Course
03  mno     sas
05  xyz     java
06  abc     dotnet
07  aaa     testing
08  bbb     sap

I tried like—

CREATE TEMPORARY TABLE temptable (idTemp int(12), totTemp int(4));

INSERT INTO temp_table(`idTemp`, `totTemp`)   select  max(ID), count(*) as tot  from table-A 
       group by Name, Course having tot > 1 or tot =1  order by ID ;

Delete from table-A where ID not in (select  idTemp  from temp_table);

Above code is working. But it’s taking very huge time on large data table. My table contains 200,000 records with 40+ columns and every month it’s added 20,000 records.

I need to find duplicates based on 10 columns (group by on 10 columns) in this case anyone suggest well and fast working code.

I find different logics on internet and tried but they are taking more time than I explained above.

My main concern is query execution time. So please suggest me good logic or query which will do above task fast.

(just for information:-- i find few logics in stackoverflow also, the best one is suggesting to make unique index on thous columns. but in my case the data is coming from government every month, it may contain duplicates in file as well as compare to database. so i need to delete the duplicates or show duplicates in grid (using asp.net). )

vishnu reddy
  • 103
  • 1
  • 4
  • 9
  • Your main concern is time, so we better hurry up doing the job for you? http://stackoverflow.com/questions/3311903/remove-duplicate-rows-in-mysql, http://stackoverflow.com/questions/4685173/delete-all-duplicate-rows-except-for-one-in-mysql. In other words, it's a duplicate of at least several questions. – Mike Feb 20 '14 at 08:42

4 Answers4

0
ALTER IGNORE TABLE table_a ADD UNIQUE INDEX index_123 (name, course );

this will drop duplicate rows. and makes inserts with duplicate data give an error, but make sure to take a backup before running this query

Saddam Abu Ghaida
  • 6,381
  • 2
  • 22
  • 29
0

You must delete the duplicates manually, to prevent this in the future you have to make the values that should not be duplicated unique, i.e users cannot share the same number, employee number etc but they can share the same name and surname. Read up on the unique property with SQL

bongzlive
  • 24
  • 2
0
DELETE t1.*
FROM
  tableName t1 INNER JOIN tableName t2
  ON t1.Name=t2.Name
     AND t1.ID < t2.ID

Please see fiddle here.

fthiella
  • 48,073
  • 15
  • 90
  • 106
0

I think you should create an unique index on these fields to avoid duplicates on INSERT

Here is the query to delete duplicates:

DELETE FROM T WHERE ID NOT IN
(SELECT MAX(ID) FROM (SELECT * FROM T) T1 GROUP BY Name)

SQLFidddle demo

One more way:

DELETE T1
FROM T as T1
LEFT JOIN (SELECT MAX(ID) as ID FROM T GROUP BY Name) as T2
  ON T1.Id=T2.Id
Where T2.id is null

SQLFidddle demo

valex
  • 23,966
  • 7
  • 43
  • 60