0

I have a database that looks like this:

enter image description here

I need to delete entries like the 2nd to the 7th, I would only want one of those. They are all the same if the movie_id, question_id and value are the same.

I am currently doing the following, and it works until it times out but it times out after only a thousand or so entries, as you can tell by the ID column, there are over 50,000 entries.

$top_index = count($all_movies)-1;$top = $all_movies[$top_index];
$max = $top->id;
for($i = 25200; $i<$max-1;$i++){
    for($j = 0; $j<$max-1;$j++){
        if($i != $j){
            if($all_movies[$i]->movie_id == $all_movies[$j]->movie_id){
                if($all_movies[$i]->question_id == $all_movies[$j]->question_id){
                    if($all_movies[$i]->value == $all_movies[$j]->value){
                        echo 'Need to remove '. $all_movies[$j]->id.':<br> Movie Id:'.$all_movies[$i]->movie_id.' Question ID: '.$all_movies[$i]->question_id.' Value: '.$all_movies[$i]->value.'<br>';
                        echo 'Matched with: <br>'. $all_movies[$i]->id.': Movie Id:'.$all_movies[$j]->movie_id.' Question ID: '.$all_movies[$j]->question_id.' Value: '.$all_movies[$j]->value.'<br>';
                        $delete = $post2->movie_value_delete($all_movies[$j]->id);
                        echo 'Deleted: '.$all_movies[$j]->id.'<br><br>';
                    }
                }
            }
        }
    }
}
Diego Agulló
  • 9,298
  • 3
  • 27
  • 41
Daniel Fein
  • 317
  • 1
  • 6
  • 18
  • 5
    You can do this in pure SQL. You'll have much better results. – Danny Beckett Mar 03 '13 at 01:56
  • Should something like this work? Database table name is movies_values: DELETE n1 FROM movies_values n1, movies_values n2 WHERE n1.id < n2.id AND n1.movie_id = n2.movie_id AND n1.value = n2.value AND n1.question_id = n2.question_id – Daniel Fein Mar 03 '13 at 02:00
  • Try it! Take a copy of your db and play with it! Just make sure you're playing with the copy, and not the live data! :p – Danny Beckett Mar 03 '13 at 02:01

2 Answers2

3

Something like this should be blazing fast compared to what you're doing. PLEASE take a backup before running it though :)

DELETE FROM movies_values
WHERE id NOT IN (
    SELECT MIN(id)
    FROM movies_values
    GROUP BY movie_id, question_id, value
)

Lamefix for being unable to write to the table you're reading from (2am, gimme a break!) - copy movies_values_test to movies_values_test2 so we can select the unique IDs from _test2 and remove any unmatched rows from _test.

DELETE FROM movies_values_test
WHERE id NOT IN (
    SELECT MIN(id)
    FROM movies_values_test2
    GROUP BY movie_id, question_id, value
)
Joe
  • 15,669
  • 4
  • 48
  • 83
  • I'm getting this error: any ideas? "You can't specify target table 'movies_values_test' for update in FROM clause" I made a new table with the same values called movies_values_test – Daniel Fein Mar 03 '13 at 02:04
  • Oh aye, this old fun one. You can't update the same table as you're selecting from. Easiest fix in this situation is copy the table again and call it `movies_values_test2` then update the SELECT to use that (while leaving the DELETE using just `_test`). It's also possible to modify the query to be cleverer, but it's 2am :P – Joe Mar 03 '13 at 02:07
  • So just fake it by getting the ID from movies_values_test2 and deleting that ID from movies_values_test? – Daniel Fein Mar 03 '13 at 02:09
  • Yep - I'll update the answer now with an example (keep smacking refresh till it updates) – Joe Mar 03 '13 at 02:11
  • You could just INSERT INTO newtable (SELECT MIN(id)...) and rename the table. – Popnoodles Mar 03 '13 at 02:14
  • Ok I am running the new movies_values_test2 query. It's not blazing like you said it should be, but will let you know the outcome, thanks! – Daniel Fein Mar 03 '13 at 02:16
  • 1
    @pop That's probably a more elegant solution :) Also Daniel I said blazing compared to your current one :P (covered my ass on that one lol) 50k rows will take a little time whatever way you do it. – Joe Mar 03 '13 at 02:17
  • Probably faster too because you're only inserting the rows you want. – Popnoodles Mar 03 '13 at 02:33
2

With

DELETE mv2
    FROM movies_values mv1
        JOIN movies_values mv2 USING (movie_id, value, question_id)
    WHERE mv1.id < mv2.id

you combine the two tables and get every combination of equivalent records. Then you delete every record which has a "partner" with a smaller id.

Don't forget to set a UNIQUE KEY afterwards in order to prevent this from happening again...

glglgl
  • 89,107
  • 13
  • 149
  • 217