2

I have a table with duplicate entries which I want to remove, keeping none but one of the duplicate entries. As you can see, they are exactly the same in every colum, there's no way to differentiate them:

enter image description here

I've used this query to figure out how many duplicates I have:

select url_rewrite_id, category_id, product_id, count(*) cnt
from catalog_url_rewrite_product_category
group by url_rewrite_id, category_id, product_id
having cnt > 1
order by cnt desc

I could use a variant of this to remove all duplicates:

delete
from catalog_url_rewrite_product_category
where url_rewrite_id in (
    select url_rewrite_id
    from catalog_url_rewrite_product_category
    group by url_rewrite_id, category_id, product_id
    having count(*) > 1
)

The problem I have with this is it would remove all entries that are duplicates and wouldn't keep the last one.

Earlier questions (here and here) assume a unique id column of sorts which is not the case with the data structure I have.

kolaente
  • 1,252
  • 9
  • 22

1 Answers1

3

Have you tried one of the solutions I read in the thread you shared (here)?

ALTER IGNORE TABLE jobs
ADD UNIQUE INDEX idx_name (site_id, title, company);

I think this would work.

  • I was just about to post this! That does indeed work. Full query that worked for me was: `alter ignore table catalog_url_rewrite_product_category add unique index tmp_remove_key(url_rewrite_id, category_id, product_id)` – kolaente Nov 17 '21 at 15:05