What is the best way to find records with duplicate values in a column using ruby and the new Activerecord?
Asked
Active
Viewed 2.9k times
29
-
Duplicated values of just 1 column or more than 1? Is/Are these string/ints/text fields? – Danny Hiemstra Feb 24 '11 at 14:54
-
just 1 column - strings. – srboisvert Feb 24 '11 at 15:23
6 Answers
51
Translating @TuteC into ActiveRecord:
sql = 'SELECT id,
COUNT(id) as quantity
FROM types
GROUP BY name
HAVING quantity > 1'
#=>
Type.select("id, count(id) as quantity")
.group(:name)
.having("quantity > 1")

Ryan Clark
- 764
- 1
- 8
- 29

fl00r
- 82,987
- 33
- 217
- 237
-
7Is this code supposed to work for PostgreSQL as well? It returns the error `PGError: ERROR: column "quantity" does not exist` – Marc Jan 29 '12 at 02:33
-
6@Marc, I am not sure. But you can try `Type.select("id, count(id) as quantity").group(:name).having("count(id) > 1")` – fl00r Jan 29 '12 at 10:47
-
-
3@holaSenor - What is a non-exact duplicate, exactly? (pun intended) – Javid Jamae Nov 19 '14 at 19:34
-
Good question, I'm sure it made sense at the time. I suspect there was another comment, since deleted, that I was responding too. Probably a comment with some code that compared all fields in the row rather than one column, but who knows. Duplicate needs to be qualified by some criteria. By exact I think I meant the proposed code compared all attributes and they would have to be the same, OP wanted to use one column. – OpenCoderX Nov 19 '14 at 19:51
-
How can I go about having the other columns of the row to work with as well? – Deekor Nov 07 '16 at 22:33
-
1re fl00rs answer: Postgresql - ActiveRecord::StatementInvalid: PG::GroupingError: ERROR: column "types.id" must appear in the GROUP BY clause or be used in an aggregate function. But add it to the group-by (no error), and then you get zero hits, since no two records have the same id-value. I've tried several of these solutions, and all either don't return the ID in the result (so who knows where the dups were), or fail to find the desired records when the ID is included. – JosephK Nov 26 '17 at 12:23
-
Best workaround is here: https://stackoverflow.com/questions/33361669/return-duplicate-records-activerecord-postgres Adapted to this would be: type_dups = Type.group(:name).having("count(name) > 1").count.keys; @results = Type.where(:name => type_dups) – JosephK Nov 26 '17 at 13:34
28
Here's how I solved it with the AREL helpers, and no custom SQL:
Person.select("COUNT(last_name) as total, last_name")
.group(:last_name)
.having("COUNT(last_name) > 1")
.order(:last_name)
.map{|p| {p.last_name => p.total} }
Really, it's just a nicer way to write the SQL. This finds all records that have duplicate last_name values, and tells you how many and what the last names are in a nice hash.

brookr
- 1,504
- 12
- 14
19
I was beating my head against this problem with a 2016 stack (Rails 4.2, Ruby 2.2), and got what I wanted with this:
> Model.select([:thing]).group(:thing).having("count(thing) > 1").all.size
=> {"name1"=>5, "name2"=>4, "name3"=>3, "name4"=>2, "name5"=>2}

Sam
- 1,205
- 1
- 21
- 39
11
With custom SQL, this finds types
with same values for name
:
sql = 'SELECT id, COUNT(id) as quantity FROM types
GROUP BY name HAVING quantity > 1'
repeated = ActiveRecord::Base.connection.execute(sql)

TuteC
- 4,342
- 30
- 40
5
In Rails 2.x, select is a private method of AR class. Just use find():
klass.find(:all,
:select => "id, count(the_col) as num",
:conditions => ["extra conditions here"],
:group => 'the_col',
:having => "num > 1")

simianarmy
- 1,485
- 10
- 13
-
3Totally off-topic, grave-digging comment here... but WHY IS THIS BETTER THAN SQL? – John Cromartie Dec 21 '12 at 17:33
-
2
Here is a solution that extends the other answers to show how to find and iterate through the records grouped by the duplicate field:
duplicate_values = Model.group(:field).having(Model.arel_table[:field].count.gt(1)).count.keys
Model.where(field: duplicate_values).group_by(&:field).each do |value, records|
puts "The records with ids #{records.map(&:id).to_sentence} have field set to #{value}"
end
It seems a shame this has to be done with two queries but this answer confirms this approach.

eremite
- 1,886
- 15
- 18