How to select unique elements

Question

I would like to extend the Array class with a uniq_elements method which returns those elements with multiplicity of one. I also would like to use closures to my new method as with uniq. For example:

t=[1,2,2,3,4,4,5,6,7,7,8,9,9,9]
t.uniq_elements # => [1,3,5,6,8]

Example with closure:

t=[1.0, 1.1, 2.0, 3.0, 3.4, 4.0, 4.2, 5.1, 5.7, 6.1, 6.2]
t.uniq_elements{|z| z.round} # => [2.0, 5.1]

Neither t-t.uniq nor t.to_set-t.uniq.to_set works. I don't care of speed, I call it only once in my program, so it can be a slow.

Not clear. Why is `5.7` included in the result of the second example? — sawa, Jul 28 '14 at 00:55

score 14 · Accepted Answer · edited May 23 '17 at 11:53

Helper method

This method uses the helper:

class Array
  def difference(other)
    h = other.each_with_object(Hash.new(0)) { |e,h| h[e] += 1 }
    reject { |e| h[e] > 0 && h[e] -= 1 }
  end
end

This method is similar to Array#-. The difference is illustrated in the following example:

a = [3,1,2,3,4,3,2,2,4]
b = [2,3,4,4,3,4]

a - b              #=> [1]
c = a.difference b #=> [1, 3, 2, 2]

As you see, a contains three 3's and b contains two, so the first two 3's in a are removed in constructing c (a is not mutated). When b contains as least as many instances of an element as does a, c contains no instances of that element. To remove elements beginning at the end of a:

a.reverse.difference(b).reverse #=> [3, 1, 2, 2]

Array#difference! could be defined in the obvious way.

I have found many uses for this method: here, here, here, here, here, here, here, here, here, here, here, here, here, here, here, here, here, here, here, here, here, here and here.

I have proposed that this method be added to the Ruby core.

When used with Array#-, this method makes it easy to extract the unique elements from an array a:

a = [1,3,2,4,3,4]
u = a.uniq          #=> [1, 2, 3, 4]
u - a.difference(u) #=> [1, 2]

This works because

a.difference(u)     #=> [3,4]

contains all the non-unique elements of a (each possibly more than once).

Problem at Hand

Code

class Array
  def uniq_elements(&prc)
    prc ||= ->(e) { e }
    a = map { |e| prc[e] }
    u = a.uniq
    uniques = u - a.difference(u)
    select { |e| uniques.include?(prc[e]) ? (uniques.delete(e); true) : false }
  end
end

Examples

t = [1,2,2,3,4,4,5,6,7,7,8,9,9,9]
t.uniq_elements
  #=> [1,3,5,6,8]

t = [1.0, 1.1, 2.0, 3.0, 3.4, 4.0, 4.2, 5.1, 5.7, 6.1, 6.2]
t.uniq_elements { |z| z.round }
  # => [2.0, 5.1]

Cary Swoveland · Answer 2 · 2014-07-28T18:25:33.860

Here's another way.

Code

require 'set'

class Array
  def uniq_elements(&prc)
    prc ||= ->(e) { e }
    uniques, dups = {}, Set.new
    each do |e|
      k = prc[e]
      ((uniques.key?(k)) ? (dups << k; uniques.delete(k)) :
          uniques[k] = e) unless dups.include?(k)
    end
    uniques.values
  end
end

Examples

t = [1,2,2,3,4,4,5,6,7,7,8,9,9,9]
t.uniq_elements #=> [1,3,5,6,8]

t = [1.0, 1.1, 2.0, 3.0, 3.4, 4.0, 4.2, 5.1, 5.7, 6.1, 6.2]
t.uniq_elements { |z| z.round } # => [2.0, 5.1]

Explanation

if uniq_elements is called with a block, it is received as the proc prc.
if uniq_elements is called without a block, prc is nil, so the first statement of the method sets prc equal to the default proc (lambda).
an initially-empty hash, uniques, contains representations of the unique values. The values are the unique values of the array self, the keys are what is returned when the proc prc is passed the array value and called: k = prc[e].
the set dups contains the elements of the array that have found to not be unique. It is a set (rather than an array) to speed lookups. Alternatively, if could be a hash with the non-unique values as keys, and arbitrary values.
the following steps are performed for each element e of the array self:
- k = prc[e] is computed.
- if dups contains k, e is a dup, so nothing more needs to be done; else
- if uniques has a key k, e is a dup, so k is added to the set dups and the element with key k is removed from uniques; else
- the element k=>e is added to uniques as a candidate for a unique element.
the values of unique are returned.

thx, I used this method till now, but it doesn't receive a block: `def uelements(a) t=a.sort u=[] u.push t1[0] if t1[0] != t1[1] for i in 1..t.size-2 do u.push t[i] if t[i] != t[i+1] && t[i] != t[i-1] end u.push t[-1] if t[-2] != t[-1] return u end` — Konstantin, Jul 28 '14 at 05:32

7stud · Answer 3 · 2014-07-28T03:30:56.263

class Array
  def uniq_elements
    counts = Hash.new(0)

    arr = map do |orig_val|
      converted_val =  block_given? ? (yield orig_val) : orig_val
      counts[converted_val] += 1
      [converted_val, orig_val]
    end

    uniques = []

    arr.each do |(converted_val, orig_val)|
      uniques << orig_val if counts[converted_val] == 1
    end

    uniques
  end
end

t=[1,2,2,3,4,4,5,6,7,7,8,9,9,9]
p t.uniq_elements

t=[1.0, 1.1, 2.0, 3.0, 3.4, 4.0, 4.2, 5.1, 5.7, 6.1, 6.2]
p  t.uniq_elements { |elmt| elmt.round }

--output:--
[1, 3, 5, 6, 8]
[2.0, 5.1]

Array#uniq does not find non-duplicated elements, rather Array#uniq removes duplicates.

VII, after the `map` block, consider `arr.each_with_object([]) do |(converted_val, orig_val),uniques|...end`. — Cary Swoveland, Jul 28 '14 at 16:12

score 1 · Answer 4 · answered Jul 20 '20 at 23:14

Use Enumerable#tally:

class Array
  def uniq_elements
    tally.select { |_obj, nb| nb == 1 }.keys
  end
end

t=[1,2,2,3,4,4,5,6,7,7,8,9,9,9]
t.uniq_elements # => [1,3,5,6,8]

If you are using Ruby < 2.7, you can get tally with the backports gem

require 'backports/2.7.0/enumerable/tally'

Boris Stitnicky · Answer 5 · 2014-07-28T04:44:34.760

0

class Array
  def uniq_elements
    zip( block_given? ? map { |e| yield e } : self )
      .each_with_object Hash.new do |(e, v), h| h[v] = h[v].nil? ? [e] : false end
      .values.reject( &:! ).map &:first
  end
end

[1,2,2,3,4,4,5,6,7,7,8,9,9,9].uniq_elements #=> [1, 3, 5, 6, 8]
[1.0, 1.1, 2.0, 3.0, 3.4, 4.0, 4.2, 5.1, 5.7, 6.1, 6.2].uniq_elements &:round #=> [2.0, 5.1]

edited Jul 28 '14 at 04:44

answered Jul 28 '14 at 03:22

Boris Stitnicky

12,444
5
57
74

each_with_index() is not necessary. You can just insert 1 every time. Also note: you traverse the array twice(the minimal number of times) but then you additionally have to call keys(). – 7stud Jul 28 '14 at 03:25
Now get rid of your Hash and use Hash.new(0) instead; there's no need to create all those arrays. – 7stud Jul 28 '14 at 03:39
I've done that before your comment was written, but thanks for attentiveness. – Boris Stitnicky Jul 28 '14 at 03:42
Yeah, but I was thinking it the first time I read your post! I still think mines more efficient...well, you nicely sidestep the map() call if there's no block given. Ahh..but yours won't work with the rounding example because you do not preserve the mapping between the original val and the converted val. – 7stud Jul 28 '14 at 03:42

7stud · Answer 6 · 2014-07-28T17:53:38.140

0

Creating and calling a default proc is a waste of time, and
Cramming everything into one line using tortured constructs doesn't make the code more efficient--it just makes the code harder to understand.
In require statements, rubyists don't capitalize file names.

....

require 'set'

class Array
  def uniq_elements
    uniques = {}
    dups = Set.new

    each do |orig_val|
      converted_val =  block_given? ? (yield orig_val) : orig_val
      next if dups.include? converted_val 

      if uniques.include?(converted_val)  
        uniques.delete(converted_val)
        dups << converted_val
      else
        uniques[converted_val] = orig_val
      end
    end

    uniques.values
  end
end


t=[1,2,2,3,4,4,5,6,7,7,8,9,9,9]
p t.uniq_elements

t=[1.0, 1.1, 2.0, 3.0, 3.4, 4.0, 4.2, 5.1, 5.7, 6.1, 6.2]

p  t.uniq_elements {|elmt|
  elmt.round
}

--output:--
[1, 3, 5, 6, 8]
[2.0, 5.1]

edited Jul 28 '14 at 17:53

answered Jul 28 '14 at 17:45

7stud

46,922
14
101
127

Thanks, 7. I initially had `next if...`, as you suggest, but switched to `unless` because the code was short, though `next` may read better. I prefer using the proc, in part because `unique_elements(&prc)` tells the reader immediately that a parameter may be being passed, and the first line clarifies that; a naked `unique_elements` suggests the opposite, until the reader sees `yield`. I downcased `set`. – Cary Swoveland Jul 28 '14 at 18:31
`I prefer using the proc, because tells the reader immediately that a parameter may be being passed` Fair enough, but ruby's syntax was created to allow you to pass a method into another method without specifying an argument. Also, frivolous method calls aren't free. `switched to unless` I follow 'Perl Best Practices' in regards to unless--it's an abomination. In any case, nice solution using only a single traversal of the array. – 7stud Jul 29 '14 at 03:38

How to select unique elements

6 Answers6

Linked