0

Suppose I have a table as such

    id user_id  value
     1   1       A
     2   1        A
     3   1       A
     4   1       A
     5   1       A
     6   1       A
     7   2       B
     8   2       B
     9   3       C

As you can see, there are many duplicates in this table. For each duplicate, I want to be able to delete all but one of them, such that I am still left with one record for each of the duplicated records. For example, for the duplicate records with user_id = 1 and value = A, I want to be able to delete all but one of these so that I am still left with one record with the values of user_id = 1 and value = A

1 Answers1

0

Agree with moveson that the best approach would be to define an index within the database to either enforce distinct records or allow you to add a uniqueness validator to the model.

Another option would be to intercept the ActiveRecord create method call to prevent duplicates:

class Example < ApplicationRecord
  def self.create( attributes = nil, &block )
    attributes.each {|attr| self.create(attr, &block) } if attributes.is_a? Array
    return if find_by attributes
    super attributes, &block
  end
end

The basic idea here is to abort if you already see a record with the same attribute values in the database... but then let ActiveRecord continue to do all the heavy lifting by calling it with super.

If there's a situation where for some reason you can't exercise control over the database to prevent duplicates from being entered, you could try an approach where you create a class method to periodically purge the duplicates from the system.

Here's an example of a method that would iterate over the entire recordset and use a hash to track whether it has seen the values before, removing them if so:

class Example < ApplicationRecord
  def self.dedup
    distinct = {}
    all.each do |row|
      if distinct[row.user_id] && distinct[row.user_id] == row.value
        row.delete
      else
        distinct[row.user_id] = row.value
      end
    end
  end
end

You could then call it like so: Example.dedup whenever you wished to purge the duplicate records.

For those looking to follow along at home, you can use the following migration:

rails g model example user_id:integer value:string

and seed file:

examples = Example.create([
  {user_id: 1, value: 'A'},
  {user_id: 1, value: 'A'},
  {user_id: 1, value: 'A'},
  {user_id: 1, value: 'A'},
  {user_id: 1, value: 'A'},
  {user_id: 1, value: 'A'},
  {user_id: 1, value: 'A'},
  {user_id: 2, value: 'B'},
  {user_id: 2, value: 'B'},
  {user_id: 3, value: 'C'},
])
Cam
  • 921
  • 7
  • 13