0

I updated an old site a couple of months ago moving from a Joomla install to a bespoke system. In order to convert the data from the Joomla tables to the new format I wrote various php scripts which stepped through the old records by section, processed them and inserted into the new table. All was fine until I recently discovered I had forgotten to add the die() statement to the top of one of the scripts and somehow a searchbot has been merrily pinging that script over time to add precisely 610 duplicates in one particular section.

So the things I do know about the data is that the row with the lowest ID is the row I want to keep, and the duplication only exists in CATEGORY = 8. To be sure of a duplicate, the row ORIGINAL_ID will match.

Beyond SELECT, INSERT, DELETE, I'm no MySQL expert, so confused as to how to approach this. What would the experts out their suggest?

Edit: Example code

ID    CATEGORY    TITLE    ORIGINAL_ID
1     7           A        1
2     8           A        2
3     8           A        2
4     8           B        3
5     8           C        4
6     8           A        2

In the above example, records 3 & 6 should be stripped, because they are in CATEGORY=8, have duplicate ORIGINAL_ID; but retain the row with the lowest id (row 2)

PeteSE4
  • 309
  • 1
  • 4
  • 21

1 Answers1

0

So, you want to identify records within Category 8, where there is another record with the same Category, Title and Original_id. You also want to check if that other record has a lower ID.

So:

Select * 
from MYTABLE T1
where CATGEORY = 8
  and EXISTS (
        select 1 
        from MYTABLE T2 
        where T2.CATEGORY=T1.CATEGORY
          and T2.TITLE=T1.TITLE
        where T2.ORIGINAL_ID=T1.ORIGINAL_ID
        where T2.ID>T1.ID

If you run this and it returns only the records you wish to delete, replace the "select *" with a "delete" and re-run.

Darius X.
  • 2,886
  • 4
  • 24
  • 51
  • Love the simplicity of this, although it appears to return all the duplicated records including the lowest ID. I changed those additional 'WHERE's to 'AND's since it was giving me a syntax error as written. Can't figure out why the T2.ID>T1.ID isn't doing what I both think it ought to. – PeteSE4 May 17 '14 at 11:51
  • OK, getting the right results if I change T2.ID>T1.ID to T2.ID – PeteSE4 May 17 '14 at 12:06
  • You may not call delete on a table while having a where clause selecting from the same table. Go for a temporary table approach. – rekaszeru May 17 '14 at 20:12
  • Ah! Sorry, didn't realize MySQL didn't allow that. Learnt something new. – Darius X. May 19 '14 at 12:59