I have a table in a Magento database called sales_bestsellers_aggregated_monthly
.
This table has over 1,500,000 rows due to an error a few years ago that created multiple unique records every day for a single product.
I want to clean the database so that only one record exists per period
, store_id
and product_id
, like this:
I tried 2 different queries but both seem to run for hours and then time out. I am wondering what is the optimal query to run on a large amount of rows?
Here is what I have tried based on SO research:
DELETE `duplicated`.*
FROM `sales_bestsellers_aggregated_monthly` AS `orig`
INNER JOIN `sales_bestsellers_aggregated_monthly` AS `duplicated`
ON (`orig`.`period` = `duplicated`.`period`
AND `orig`.`store_id` = `duplicated`.`store_id`
AND `orig`.`product_id` = `duplicated`.`product_id`);
I also tried this one:
DELETE a
FROM sales_bestsellers_aggregated_monthly as a, sales_bestsellers_aggregated_monthly as b
WHERE
(a.period = b.period OR a.period IS NULL AND b.period IS NULL)
AND (a.store_id = b.store_id OR a.store_id IS NULL AND b.store_id IS NULL)
AND (a.product_id = b.product_id OR a.product_id IS NULL AND b.product_id IS NULL)
AND a.id < b.id;
I am currently doing this locally using MAMP Pro and Sequel Pro. The table I am trying to clean is over 110Mb. I don't know if this is a factor?