I have a table “table-A” with duplicate records like (duplicates based on “name” column) "`
ID Name Course
01 abc dotnet
02 xyz java
03 mno sas
04 abc dotnet
05 xyz java
06 abc dotnet
07 aaa testing
08 bbb sap
--- “abc” came 3 times (id—1,4,6)
--- “xyz” came 2 times (id – 2,5)
`" From above table I need to delete duplicates (with ID -1, 4, 2) — not IDs 6, 5 those are latest files. “table-A” should contain only--
ID Name Course
03 mno sas
05 xyz java
06 abc dotnet
07 aaa testing
08 bbb sap
I tried like—
CREATE TEMPORARY TABLE temptable (idTemp int(12), totTemp int(4));
INSERT INTO temp_table(`idTemp`, `totTemp`) select max(ID), count(*) as tot from table-A
group by Name, Course having tot > 1 or tot =1 order by ID ;
Delete from table-A where ID not in (select idTemp from temp_table);
Above code is working. But it’s taking very huge time on large data table. My table contains 200,000 records with 40+ columns and every month it’s added 20,000 records.
I need to find duplicates based on 10 columns (group by on 10 columns) in this case anyone suggest well and fast working code.
I find different logics on internet and tried but they are taking more time than I explained above.
My main concern is query execution time. So please suggest me good logic or query which will do above task fast.
(just for information:-- i find few logics in stackoverflow also, the best one is suggesting to make unique index on thous columns. but in my case the data is coming from government every month, it may contain duplicates in file as well as compare to database. so i need to delete the duplicates or show duplicates in grid (using asp.net). )