Removing duplicates with rake task

Question

There's a lot of similiar questions like this, but couldn't find good answer for me.

I have EntryVote model with fields user_id, entry_id and some others.

I want to create simple rake task to remove duplicates for user_id, entry_id groups (doesn't matter whic record left from the group). What's the best way of doing it?

For example:

id, user_id, entry_id
1,1,1
2,1,1
3,1,1
4,5,6
5,5,6
6,7,7

I get:

1,1,1
4,5,6
6,7,7

I know how to select user_id, entry_id for deduplication, but not sure how to work with it afterward:

EntryVote.select('user_id, entry_id').group('user_id,entry_id').having('count() > 1')

For confirmation, you want to remove duplicate `EntryVotes` where the `user_id` and `entry_id` are the same? — Robin Fisher, Mar 05 '13 at 11:56

score 0 · Answer 1 · answered Mar 05 '13 at 12:36

might not be the best solution out there but try the following

EntryVote.count(:id, group: [:user_id, :entry_id]).each do |(user_id, entry_id), count|
  if count > 1
    EntryVote.offset(1).where(user_id: user_id, entry_id: entry_id).delete_all
  end
end

or you can add a validation that checks for uniqueness of user_id and entry_id and try to save the record. If the record doesn't save and fails due to the validation, then just delete the record. I'm pretty sure this is slower that the first option :)

score 0 · Answer 2 · edited May 23 '17 at 11:57

If you want the columns entry_id and user_id to be a unique foreign key, the following rake task which contains a special SQL delete statement would help

  task 'delete_duplicates' => :environment do
    puts "Removing duplicates in table entry_votes"
    puts "Entries before: #{n1=EntryVote.count}"
    sql = "delete e1 from entry_votes e1, entry_votes e2 "+
          "where (e1.user_id = e2.user_id) and (e1.entry_id = e2.entry_id) "+
          "and (e1.id > 12.id);")
    ActiveRecord::Base.connection.execute(sql);
    puts "Entries after: #{n2=EntryVote.count}, #{n1-n2} duplicates removed"
  end

See also this SO question about duplicates or this article how to delete duplicates using SQL.

Removing duplicates with rake task

2 Answers2