2

I am trying to kill the records in a table which are duplicates. And my aim is to always delete old ones. It's a rake task.

The main section is similar to below,

    TableName.order(updated_at: :asc).each do |record|
      next if record.valid?
      record.destroy!
    end

If there are more than 2 duplicated rows are existing, this loop only deletes one of them and exists.

To understand it I debugged into the loop and watched line-by-line, and ta-da! all works. Probably before the destroy! action is not being done the loop iterates. And probably, it only deletes the last record (not every duplicate group's last item)

Anyways, I can fix it via generating an array to destroy all at once or smth but curious why ruby/ActiveRecord acts like this.

As an answer, detailed explanation of what's going on would be appreciated. Thanks

Baran Yeni
  • 314
  • 5
  • 14
  • Quick question(s): is there a particular reason you're using the "bang" `!`? Do you get the same issue with `#destroy` or `#delete`? Do you try to trap any exceptions that are generated? – Jad Mar 15 '23 at 22:06
  • Hi Jad, used Bang to see if anything fails but it turns out that no fails it's just the problem I described. And yes, I even tried update_all(deleted_at: Time.zone.now) and same happens, with delete prob same will happen. (reloading object did not helped but;) Check this please: https://www.ruby-forum.com/t/objects-wont-destroy-when-inside-of-a-loop/109242 – Baran Yeni Mar 15 '23 at 22:09
  • Sounds suspicious. Any chance we can see the model? – Jad Mar 15 '23 at 22:10
  • 1
    Moreover, I added a reload call and the printed deleted_at field. Even the delete is being called before the reload; deleted_at returns empty. And all works fine when I add sleep at the last line of each loop. So, delete is acting like async, but even if it's the case how come?? – Baran Yeni Mar 15 '23 at 22:13
  • 1
    This is a really strange way to eliminate duplicates. It would be a lot more efficient to just write a database query to find the duplicates and call delete_all instead to do it in one or two database queries total. What you're doing will at least do N+2 database queries - it could be even more depending on what validations you have. https://stackoverflow.com/questions/28156795/how-to-find-duplicate-records-in-postgresql https://stackoverflow.com/questions/688549/finding-duplicate-values-in-mysql – max Mar 16 '23 at 11:11
  • This code is also very prone to race conditions - since the validations fire a select query to see if there are duplicates that may or may not complete before the previous row processed was deleted. If you really wanted to do it this way (you don't) you would have to use locks or some other mechanism to ensure that everything is happening in sync. – max Mar 16 '23 at 11:21
  • Since I try to delete duplicates I wanted to go one by one and leave the most recent record. To do this I needed to use LIMIT 9999 OFFSET 1, and you are right. Nut I wanted to understand and learn what is the way of fixing this race condition problem in a loop in ruby. – Baran Yeni Mar 16 '23 at 13:52
  • The solution to the reace condition is to not use a loop in ruby. That's something you do as a last resort. – max Mar 17 '23 at 17:11
  • Why do you need to do this in a loop? Are there some special validations? We need logs and more information if you want to get a solution. Just placing a bounty won't achieve anything if the question is missing half the info. And add what you did, what you described in comments, to the question. ex. did x, result y. (delete_all and others) – merof Mar 26 '23 at 16:25

1 Answers1

0

once you call record.destroy!, the updated_at attribute of the other duplicate records does not get updated, so they remain in their current order in the database. As a result, the loop can only delete the first duplicate record it encounters, but not the others.