I've got an array A. I'd like to check if it contains duplicate values. How would I do so?
Asked
Active
Viewed 6.9k times
80
-
25"marked as duplicate" well this is pretty meta considering the question. – cpursley Feb 18 '15 at 03:51
-
2"How do I check an array for duplicates?" is not quite the same question as "How to find and return a duplicate value in an array?". This question is asking how to determine the uniqueness of the array, whereas the other is asking how to pull duplicate values out of the array. I don't think it should be marked as duplicate -- but the other question is similar, and should be linked in a comment. – emery Apr 11 '16 at 19:33
-
But the irony is rich, isn't it? – omikes Sep 14 '16 at 15:06
4 Answers
155
Just call uniq
on it (which returns a new array without duplicates) and see whether the uniq
ed array has less elements than the original:
if a.uniq.length == a.length
puts "a does not contain duplicates"
else
puts "a does contain duplicates"
end
Note that the objects in the array need to respond to hash
and eql?
in a meaningful for uniq
to work properly.

sepp2k
- 363,768
- 54
- 674
- 675
-
11Additionally, `uniq!` will return `nil` if no duplicates were found, altering self to remove duplicates. There are a lot of array methods available: http://ruby-doc.org/core/classes/Array.html – David Dec 04 '10 at 01:32
-
2Of course, if they don't respond to `eql?` in a meaningful way, then what does "duplicate" even **mean**? And once you've defined `eql?`, `hash` should be consistent with it. – Karl Knechtel Dec 04 '10 at 01:33
-
1
-
5@sepp2k Out of curiosity, why do you have `a.uniq.length == a.length` instead of just `a.uniq == a`? – Paul Hoffer Dec 04 '10 at 02:28
-
-
15
-
1@Karl: I'm not sure what you mean by that. My point is there are classes which override `==`, but not `eql?` or `hash` (like `Array` and `Hash` in ruby 1.8.6 for example), and for those `uniq` would yield surprising results while a more primitive solution using `==` would not. – sepp2k Dec 04 '10 at 11:47
-
@sepp2k Just looked up the difference between `==` and `eql?` in Ruby. Don't really see what's going on here - why would you ever implement `eql?` as something other than a type-check followed by an `==` check? Will Ruby not fall back to `==` when `eql?` isn't specifically provided? That kinda sucks... – Karl Knechtel Dec 04 '10 at 12:03
-
1@Karl: `Object#eql?` will call `equal?`, which checks for reference equality and should not be overridden. This corresponds with `Object#hash`, which returns the object's `object_id`. If `Object#eql?` called `==` then overriding `==` without overriding `hash` would break the same way that overriding `eql?` without `hash` does. The way it is now, `eql?` and `hash` are still in sync if you only override `==`, they just don't act the way one might expect (but that's still better than being totally broken). The problem is that ruby can't automatically define `hash` to be in sync with `==`. – sepp2k Dec 04 '10 at 12:15
-
@sepp2k `eql?` calls `equal?` ? That's just supposed to be an optimization, right? Irritating if that sort of optimization causes a headache. :( I understand that `eql?` is supposed to compare both value and type, so it shouldn't particularly care about identity - identity implies (value and type), sure, but just testing value and type separately would be fine anyway, surely? – Karl Knechtel Dec 04 '10 at 12:19
-
@Karl: How would you propose testing for the value? Calling `==`? As I said that would break if you don't override `hash`. Comparing every instance variable (and likewise hashing every instance variable in `hash`)? That might work in some/most cases, but does not always lead to the wanted behavior either (of course you could still override in the cases where it doesn't...). It should be noted that both `eql?` and `==` default to calling `equal?`, so generally objects don't have a notion of value other than their object_id unless you define one. – sepp2k Dec 04 '10 at 12:27
-
@sepp2k so... let me see if you have this straight. If you implement `==`, you need to implement `hash`. If you need to implement `eql?`, you need to implement `hash` as well. `eql?` is supposed to be equivalent to checking a type in addition to whatever `==` does. So what exactly is accomplished by offering the ability to override either or both of `==` and `eql?`. Why does Ruby make the distinction? – Karl Knechtel Dec 04 '10 at 12:31
-
@Karl Knechtel, See http://stackoverflow.com/questions/4351390/how-do-i-check-an-array-for-duplicates . One answer says that == tests for equality and eql? tests for equality and same type. – Wayne Conrad Dec 04 '10 at 15:22
-
@Wayne: But it doesn't return true right away if the sizes don't differ (which would be stupid of course). – sepp2k Dec 04 '10 at 15:50
-
-
@Nakilon Thanks, I'd never thought of that but it makes a lot of sense. – Paul Hoffer Dec 04 '10 at 17:44
-
@Wayne Conrad: "== tests for equality and eql? tests for equality and same type" ... that doesn't say anything I haven't already acknowledged several times now. My point: isn't testing for same type built-in? Why would you need to write a wrapper to combine the two tests, and, worse, re-write it for every class? For a Pythonista this is very confusing... we only have `==` (implemented using \_\_eq\_\_) and `is` (equivalent to `equal?`). – Karl Knechtel Dec 04 '10 at 19:56
39
In order to find the duplicated elements, I use this approach (with Ruby 1.9.3):
array = [1, 2, 1, 3, 5, 4, 5, 5]
=> [1, 2, 1, 3, 5, 4, 5, 5]
dup = array.select{|element| array.count(element) > 1 }
=> [1, 1, 5, 5, 5]
dup.uniq
=> [1, 5]

jmonteiro
- 1,702
- 15
- 25
-
7select { array.count } is a nested loop, you're doing an O(n^2) complex algorithm for something which can be done in O(n). – apeiros Nov 16 '12 at 18:10
-
1You're right, to solve this Skizit's question we can use in O(n); but in order to find out which elements are duplicated an O(n^2) algo is the only way I can think of so far. – jmonteiro Nov 22 '12 at 18:57
-
4
-
1
-
@alex88 I didn't provide one as others already did. For example sepp2k's answer: https://stackoverflow.com/a/4351408/764342 – apeiros Feb 27 '18 at 19:00
9
If you want to return the duplicates, you can do this:
dups = [1,1,1,2,2,3].group_by{|e| e}.keep_if{|_, e| e.length > 1}
# => {1=>[1, 1, 1], 2=>[2, 2]}
If you want just the values:
dups.keys
# => [1, 2]
If you want the number of duplicates:
dups.map{|k, v| {k => v.length}}
# => [{1=>3}, {2=>2}]

Benjamin Crouzier
- 40,265
- 44
- 171
- 236
4
Might want to monkeypatch Array if using this more than once:
class Array
def uniq?
self.length == self.uniq.length
end
end
Then:
irb(main):018:0> [1,2].uniq?
=> true
irb(main):019:0> [2,2].uniq?
=> false

fakeleft
- 2,830
- 2
- 30
- 32
-
22
-
-
@sidewaysmilk why should this be avoided? It seems like the ruby way: elegant, dynamic and dry. – hamstar May 22 '14 at 02:32
-
4One (main?) reason is that if you do it in code that ends up being used by others (gems), they pull in your monkeypatches, which can lead to unexpected behaviour. So I guess I should add a caveat that this should only be used in code you'll use yourself, and in a place that you can remember (a monkey_patch.rb file?), so you don't have to hunt too much for code that changes the default behaviour. – fakeleft May 23 '14 at 15:25