I'm trying to write a Laravel eloquent statement to do the following.
Query a table and get all the ID's of all the duplicate rows (or ideally all the IDs except the ID of the first instance of the duplicate).
Right now I have the following mysql statement:
select `codes`, count(`codes`) as `occurrences`, `customer_id` from `pizzas`
group by `codes`, `customer_id`
having `occurrences` > 1;
The duplicates are any row that shares a combination of codes
and customer_id
, example:
codes,customer_id
183665A4,3
183665A4,3
183665A4,3
183665A4,3
183665A4,3
I'm trying to delete all but 1 of those.
This is returning a set of the codes, with their occurrences and their customer_id, as I only want rows that have both.
Currently I think loop through this, and save the ID of the first instance, and then call this again and delete any without that ID. This seems not very fast, as there's about 50 million rows so each query takes forever and we have multiple queries for each duplicate to delete.
// get every order that shares the same code and customer ID
$orders = Order::select('id', 'codes', DB::raw('count(`codes`) as `occurrences`'), 'customer_id')
->groupBy('codes')
->groupBy('customer_id')
->having('occurrences', '>', 1)
->limit(100)
->get();
// loop through those orders
foreach ($orders as $order)
{
// find the first order that matches this duplicate set
$first_order = Order::where('codes', $order->codes)
->where('customer_id', $order->customer_id)
->first();
// delete all but the first
Order::where('codes', $order->codes)
->where('customer_id', $order->customer_id)
->where('id', '!=', $first_order->id)
->delete();
}
There has got to be a more efficient way to track down all rows that share the same code and customer_id, and delete all the duplicates but keep the first instance, right? lol
I'm thinking maybe if I can add a fake column to the results that is an array of every ID, I could at least then remove the first ID and delete the others.