I see a lot of ways to find duplicate records and delete them but I could not find the one I am looking for i.e. to delete them based on certain condition.
The first SQL code below taken from another post works very well in locating duplicates. However, I am having problem withe the second part.
<cfquery datasource="mydatabase" name="duplist">
SELECT a.*, b.totalCount AS Duplicate
FROM mytable a
INNER JOIN
(
SELECT email, COUNT(*) totalCount
FROM mytable
GROUP BY email
HAVING COUNT(*) > 1
) b ON a.email = b.email
</cfquery>
<cfoutput query="duplist">
<CFQUERY DATASOURCE="mydatabase" name="dlist">
SELECT * FROM mytable
WHERE userid = '#userid#'
AND activedate is null
</CFQUERY>
</cfoutput>
<cfoutput>
Total To Be Deleted: #dlist.recordcount#
</cfoutput>
I am trying use SELECT (before replace it with DELETE) only those duplicates user emails that do not have an "activedate" based on the output query "duplist" with duplicated email records. I am only getting one record (in #dlist.recordcount#) which is the last record! Isn't the query supposed to loop through them all?
EDITED:
After some feedback and checking the recordcount wasn't counting correctly because it is outside the loop. It will work in retrieving those records without startdate. However, now the real problem is in the following scenario:
ID USERID EMAIL STARTDATE
1 user1 test@test.com
2 user2 test@test.com 11/01/2014
3 user3 test@test.com
4 user4 test@test.com 11/02/2014
5 user5 me@mydomain.com
6 user6 me@mydomain.com
Running through the above code will list these 6 records of duplicate emails. The second query will select(delete) those without startdate i.e. Record 1, 3, 5 and 6. Two problems:
- Record 1 and 3 will be deleted, but 2 and 4 are still duplicates
- Record 5 and 6 will be deleted and I am left with no user with the email. I need to keep one of them.
How can the codes be modified to deal with the scenarios? I want to keep record #4 and #6 because 4 is newer than 2 and 6 is newer than 6.
4 user4 test@test.com 11/02/2014
6 user6 me@mydomain.com