0

Say I have such an array:

arr = ['footballs_jumba_10', 'footballs_jumba_11', 'footballs_jumba_12',
       'footballs_jumba_14', 'alpha_romeo_11', 'alpha_romeo_12',
       'alpha_juliet_10', 'alpha_juliet_11']

If I wanted to return duplicates, (assuming any of these strings in the array were exactly identical, I would just

return arr.detect{ |a| arr.count(a) > 1 }

but, what if I wanted to get only duplicates of the first 10 characters of each element of the array, without knowing the variations beforehand? Like this:

['footballs_', 'alpha_rome', 'alpha_juli']
Cary Swoveland
  • 106,649
  • 6
  • 63
  • 100
oorahduc
  • 185
  • 2
  • 16
  • Your example would have been better had you included a string whose first 10 characters were unique, as it would not have been returned in the desired result. (Too late now to change it.) – Cary Swoveland Nov 19 '15 at 02:19

3 Answers3

1

Use Array#uniq:

arr.map {|e| e[0..9]}.uniq
# => ["footballs_", "alpha_rome", "alpha_juli"]
Yu Hao
  • 119,891
  • 44
  • 235
  • 294
  • With `arr << "add a unique string"`, `arr.map {|e| e[0..9]}.uniq #=> ["footballs_", "alpha_rome", "alpha_juli", "add a uniq"]`, but only duplicates are wanted. – Cary Swoveland Nov 19 '15 at 02:00
1

This is quite straightforward with the method Arry#difference that I proposed in my answer here:

arr << "Let's add a string that appears just once"
  #=> ["footballs_jumba_10", "footballs_jumba_11", "footballs_jumba_12",
  #    "footballs_jumba_14", "alpha_romeo_11", "alpha_romeo_12",
  #    "alpha_juliet_10", "alpha_juliet_11", "Let's add a string that appears just once"]

a = arr.map { |s| s[0,10] }
  #=> ["footballs_", "footballs_", "footballs_", "footballs_", "alpha_rome",
  #    "alpha_rome", "alpha_juli", "alpha_juli", "Let's add "] 
b = a.difference(a.uniq)
  #=> ["footballs_", "footballs_", "footballs_", "alpha_rome", "alpha_juli"] 
b.uniq
  #=> ["footballs_", "alpha_rome", "alpha_juli"] 
Community
  • 1
  • 1
Cary Swoveland
  • 106,649
  • 6
  • 63
  • 100
0

You could do something like this:

def partial_duplicates(elements)
  unique = {}
  duplicates = {}

  elements.each do |e|
    partial = e[0..9]

      # If the element is in the hash, it is a duplicate.
      if first_element = unique[partial]
        duplicates[first_element] = true
        duplicates[e] = true
      else
        # include the element as unique
        unique[partial] = e
      end
  end

  duplicates.keys
end

This will return unique duplicates. If you want all the duplicates, you can just use an Array.

Also, this returns all the full representations of each duplicate as it seems more useful and probably what you want:

partial_duplicates(arr)
=> ["footballs_jumba_10", "footballs_jumba_11", "footballs_jumba_12", "footballs_jumba_14", "alpha_romeo_11", "alpha_romeo_12", "alpha_juliet_10", "alpha_juliet_11"]

If you want only the partial duplicates you can change the condition to:

if unique[partial]
  duplicates[partial] = true
else
  unique[partial] = true
end

then:

partial_duplicates(arr)
=> ["footballs_", "alpha_rome", "alpha_juli"]
mrstif
  • 2,736
  • 2
  • 27
  • 28
  • Since `e` is a string and `duplicates` is a hash, you can't write `duplicates << e`, but you're on the right track. I suggest you make `unique` and `duplicates` sets (`require 'set'; unique = Set.new`). You could write `duplicates.add(e)` (or its alias `duplicates << e`) to add `e` to the set but [Set#add?](http://ruby-doc.org/stdlib-2.2.0/libdoc/set/rdoc/Set.html#method-i-add-3F) would be better, as it both adds the element if it's not already in the set and tells you if it was added. Your last line is different and gives the wrong answer when `arr` contains a unique element. Test! – Cary Swoveland Nov 19 '15 at 03:29
  • Thanks for the tips Cary. `duplicates << e` was a mistake indeed. But since `Set` actually uses hashes in its implementation, I decided to keep the hash, thus not needing the extra `require`. As for the last suggestion, I removed it entirely. – mrstif Nov 19 '15 at 12:23
  • I don't think avoiding a `require` is a good reason to use phoney hashes rather than sets. By that argument you would never use sets, but we have them for just this type of situation. It's analogous to replacing sets in mathematics with functions having certain properties. It would make the math harder to follow with no advantage. (It probably would also spark a rash of suicides among mathematicians. Fortunately, most Rubiests are not that passionate.) I would be interested in the views of other readers about this. – Cary Swoveland Nov 19 '15 at 14:21
  • Cary, my decision to use a hash in this case is simply one of simplifying the code in this example. I don't see how a `Set` makes this example simpler or more useful. And the *phoney* hashes as you mentioned it, is basically how [Set](http://ruby-doc.org/stdlib-2.2.3/libdoc/set/rdoc/Set.html#method-i-add) is implemented. – mrstif Nov 19 '15 at 14:22