Situation:
- Old scripts added rows to a table without deleting existing rows.
- Need to discover "duplicate" rows (based on matching two fields).
- For each set of duplicate rows, sort by ids and return all but the newest one (highest id).
- Each row has an associated external file, so can't simply delete the older rows - need to return a list of all the older rows, which will then be processed by a php script.
Example:
TABLE mytable:
ID A B Filename
1 10 abc aa.png
2 11 dddd bb.xml
3 10 abc cc.png
4 10 dddd dd.png
5 10 abc ee.xml
6 11 dddd ff.xml
Rows with IDs 1 & 3 & 5 are duplicates (both A and B match). Similarly, 2 & 6 are duplicates. Return list (1, 2, 3) - these are the "older" rows that need to be processed.
Even better: return a set of records, containing 'ID' and 'Filename' for those rows.
My primary question is an SQL query that does this, though it would also be useful to me to see how to use the result of that query in php.
There are existing stackoverflow posts related to deleting duplicate rows, but the ones I found delete the rows directly. This won't work for me, as I need to have the external php script delete the corresponding external files:
Deleting Duplicate Rows from MySql Table
How to delete duplicate records in mysql database?
How to delete all the duplicate records in PHP/Mysql
IMPORTANT: The other posts which I quote don't bother to distinguish newer from older; they are about removing fully duplicate records, but that is not my situation. I have records which are partially duplicates; that is, several records match the specified criteria, but there is important information in other fields, hence I have to know which is newest (highest id) for each value of criteria; those are the ones to keep.