Rails 5 ActiveRecord Delete Duplicates

Question

I have a Rails 5 app. I have a table filled with URL data that is pulled in from various sources:

id     url
1      http://google.com
2      http://yahoo.com
3      http://msn.com
4      http://google.com
5      http://yahoo.com
6      http://askjeeves.com

How can I remove the duplicates from this table?

[This is also quite a performant way to do it with plain SQL](https://stackoverflow.com/a/41545885/566484) — Christian Butzke, Oct 31 '19 at 09:48

score 23 · Answer 1 · edited May 21 '18 at 07:27

23

SQL solution without loops:

Model.where.not(id: Model.group(:url).select("min(id)")).destroy_all

OR

Model.where.not(id: Model.group(:url).select("min(id)")).delete_all

OR

dup_ids = Model.group(:url).select("min(id)").collect{|m| m['min(id)']}
Model.where.not(id: dup_ids).delete_all
#Model.where.not(id: dup_ids).destroy_all

This will delete all duplicates keeping records with minimum id for duplicate records.

edited May 21 '18 at 07:27

bragboy

34,892
30
114
171

answered Nov 08 '16 at 08:04

dnsh

3,516
2
22
47

Great and simple SQL solution. It took me a long time to find something that works like this. Many other solutions included something like `Model.select(:url).group(:url).having('count("url") > 1')` which always resulted in a `GroupingError: ERROR: column must appear in the GROUP BY clause or be used in an aggregate function`. – mabu Jun 18 '18 at 12:23
The best solution should be `Model.where.not(id: Model.group(:url).select("min(id)")).delete_all ` Unfortunately, this throws a "You can't specify target table 'models' for update in FROM clause". The generated SQL is something like this: ```SQL DELETE FROM `models` WHERE `models`.`id` NOT IN (SELECT min(id) FROM `models` GROUP BY `models`.`url`) ``` – Christian Butzke Oct 31 '19 at 09:45
(I am using MySQL 5.7) – Christian Butzke Oct 31 '19 at 09:56

score 6 · Accepted Answer · edited Dec 24 '19 at 13:33

6

You can group by url, leave one and delete duplicates:

Model.all.group(:url).values.each do |dup|
  dup.pop #leave one
  dup.each(&:destroy) #destroy other
end

edited Dec 24 '19 at 13:33

bragboy

34,892
30
114
171

answered Nov 08 '16 at 04:09

idej

5,034
1
11
14

score 5 · Answer 3 · answered Nov 13 '19 at 21:11

5

Get array of good ids and then delete all records not in that list.

good_ids = Model.group(:url).pluck("max(id)")
Model.where.not(id: good_ids).delete_all

answered Nov 13 '19 at 21:11

konyak

10,818
4
59
65

score 0 · Answer 4 · answered Nov 08 '16 at 04:09

// Find all duplicate records and group them by a field

dups = MyModel.group(:url).having('count("url") > 1').count(:name)

// Iterate on each grouped item to destroy duplicate

dups.each do |key, value|

  # Keep one and return rest of the duplicate records

  duplicates = MyModel.where(url: key)[1..value-1]
  puts "#{key} = #{duplicates.count}"
  duplicates.each(&:destroy)

end

score 0 · Answer 5 · answered Oct 31 '19 at 09:51

This also seems to be a solution.

I tried to convert it to Ruby, but it got quite complex (since I had more fields to group by), so I ended up just using plain SQL

DELETE t1 FROM 
urls t1
INNER JOIN (
    SELECT MAX(id) AS id, url FROM urls 
    GROUP BY url 
    HAVING COUNT(*) > 1
) t2 on t1.url = t2.url and t1.id != t2.id;

Hope that helps

Rails 5 ActiveRecord Delete Duplicates

5 Answers5

Linked