29

What is the best way to find records with duplicate values in a column using ruby and the new Activerecord?

srboisvert
  • 12,679
  • 15
  • 63
  • 87

6 Answers6

51

Translating @TuteC into ActiveRecord:

sql = 'SELECT id, 
         COUNT(id) as quantity 
         FROM types 
         GROUP BY name 
       HAVING quantity > 1'
#=>
Type.select("id, count(id) as quantity")
  .group(:name)
  .having("quantity > 1")
Ryan Clark
  • 764
  • 1
  • 8
  • 29
fl00r
  • 82,987
  • 33
  • 217
  • 237
  • 7
    Is this code supposed to work for PostgreSQL as well? It returns the error `PGError: ERROR: column "quantity" does not exist` – Marc Jan 29 '12 at 02:33
  • 6
    @Marc, I am not sure. But you can try `Type.select("id, count(id) as quantity").group(:name).having("count(id) > 1")` – fl00r Jan 29 '12 at 10:47
  • this will only find exact duplicates. – OpenCoderX Dec 21 '12 at 15:04
  • 3
    @holaSenor - What is a non-exact duplicate, exactly? (pun intended) – Javid Jamae Nov 19 '14 at 19:34
  • Good question, I'm sure it made sense at the time. I suspect there was another comment, since deleted, that I was responding too. Probably a comment with some code that compared all fields in the row rather than one column, but who knows. Duplicate needs to be qualified by some criteria. By exact I think I meant the proposed code compared all attributes and they would have to be the same, OP wanted to use one column. – OpenCoderX Nov 19 '14 at 19:51
  • How can I go about having the other columns of the row to work with as well? – Deekor Nov 07 '16 at 22:33
  • 1
    re fl00rs answer: Postgresql - ActiveRecord::StatementInvalid: PG::GroupingError: ERROR: column "types.id" must appear in the GROUP BY clause or be used in an aggregate function. But add it to the group-by (no error), and then you get zero hits, since no two records have the same id-value. I've tried several of these solutions, and all either don't return the ID in the result (so who knows where the dups were), or fail to find the desired records when the ID is included. – JosephK Nov 26 '17 at 12:23
  • Best workaround is here: https://stackoverflow.com/questions/33361669/return-duplicate-records-activerecord-postgres Adapted to this would be: type_dups = Type.group(:name).having("count(name) > 1").count.keys; @results = Type.where(:name => type_dups) – JosephK Nov 26 '17 at 13:34
28

Here's how I solved it with the AREL helpers, and no custom SQL:

Person.select("COUNT(last_name) as total, last_name")
  .group(:last_name)
  .having("COUNT(last_name) > 1")
  .order(:last_name)
  .map{|p| {p.last_name => p.total} }

Really, it's just a nicer way to write the SQL. This finds all records that have duplicate last_name values, and tells you how many and what the last names are in a nice hash.

brookr
  • 1,504
  • 12
  • 14
19

I was beating my head against this problem with a 2016 stack (Rails 4.2, Ruby 2.2), and got what I wanted with this:

> Model.select([:thing]).group(:thing).having("count(thing) > 1").all.size
 => {"name1"=>5, "name2"=>4, "name3"=>3, "name4"=>2, "name5"=>2}
Sam
  • 1,205
  • 1
  • 21
  • 39
11

With custom SQL, this finds types with same values for name:

sql = 'SELECT id, COUNT(id) as quantity FROM types
         GROUP BY name HAVING quantity > 1'
repeated = ActiveRecord::Base.connection.execute(sql)
TuteC
  • 4,342
  • 30
  • 40
5

In Rails 2.x, select is a private method of AR class. Just use find():

klass.find(:all, 
  :select => "id, count(the_col) as num", 
  :conditions => ["extra conditions here"], 
  :group => 'the_col', 
  :having => "num > 1")
simianarmy
  • 1,485
  • 10
  • 13
2

Here is a solution that extends the other answers to show how to find and iterate through the records grouped by the duplicate field:

duplicate_values = Model.group(:field).having(Model.arel_table[:field].count.gt(1)).count.keys
Model.where(field: duplicate_values).group_by(&:field).each do |value, records|
  puts "The records with ids #{records.map(&:id).to_sentence} have field set to #{value}"
end

It seems a shame this has to be done with two queries but this answer confirms this approach.

eremite
  • 1,886
  • 15
  • 18