58

How do I find an item in array which has the most occurrences?

[1, 1, 1, 2, 3].mode
=> 1

['cat', 'dog', 'snake', 'dog'].mode
=> dog
the Tin Man
  • 158,662
  • 42
  • 215
  • 303
  • 3
    What you're asking about is called the "mode." – Rob Kennedy Jan 05 '09 at 03:37
  • 1
    Method "mode" is not working in Rails 4 :( I found the answer here => http://stackoverflow.com/questions/8921999/ruby-how-to-find-and-return-a-duplicate-value-in-array :) – Romans 8.38-39 Jul 08 '14 at 05:00
  • @Romans8.38-39 - thanks, thought I was going mad. Just another useful thing that must be thrown out to break working code, it seems. Someone needs to fork Rails and put back all the good pieces. – JosephK Nov 17 '15 at 11:26

11 Answers11

86

First build a hash mapping each value in the array to its frequency…

arr = [1, 1, 1, 2, 3]

freq = arr.inject(Hash.new(0)) { |h,v| h[v] += 1; h }
#=> {1=>3, 2=>1, 3=>1}

… then use the frequency table to find the element with the highest frequency:

arr.max_by { |v| freq[v] }
#=> 1
Sophie Alpert
  • 139,698
  • 36
  • 220
  • 238
  • 1
    Or may be `freq.max_by { |_, v| v }.first` as the last line. – Srujan Barai Sep 05 '17 at 21:00
  • 2
    if you prefer a on-liner you could also do `arr.each_with_object(Hash.new(0)) { |v, h| h[v] += 1 }.max_by(&:last)` – mfittko Dec 01 '17 at 08:55
  • Plenty ways to do this^v^ `to_count = [[5, 6], [1, 2], [3, 4], [7, 8], [1, 2]]` `counter1 = to_count.each_with_object(Hash.new(0)){|it, acc| acc[it] += 1}` `counter2 = Hash.new(0).tap{|h| to_count.each{|it| h[it] += 1}}` `counter3 = to_count.group_by(&:itself).map{|it, its| [it, its.length]}.to_h` `counter4 = to_count.inject(Hash.new(0)) { |h,v| h[v] += 1; h }` `p counter1.max_by(&:last).first` `p counter2.max_by(&:last).first` `p to_count.max_by{|v| counter3[v]}` `p to_count.max_by{|v| counter4[v]}` `max1 = to_count.group_by{|n| n}.values.max_by(&:size).first` `p max1` – Weilory Mar 21 '20 at 07:33
  • Huh............ – vrintle Aug 10 '20 at 05:23
  • There are lot of way to do it but fastest is to use stack. see https://app.codility.com/programmers/lessons/8-leader/dominator/ – Vijay Meena Aug 17 '20 at 05:24
29
array.max_by { |i| array.count(i) }
Pang
  • 9,564
  • 146
  • 81
  • 122
Nathan
  • 7,627
  • 11
  • 46
  • 80
  • 1
    Instead of providing a code-only answer, adding an explanation on how your answer solves the problem will help your readers to learn. – ivan.sim Jun 07 '15 at 02:39
  • Thanks, you helped me! Your code returns the most frequent value only. I've modified it a bit to return all values sorted by max first: array.sort_by { |u| array.count(u) }.reverse – Al17 Mar 11 '16 at 14:24
  • This is `O(n^2)`, though. – YtvwlD Dec 09 '21 at 22:42
  • @YtvwlD optimization wasn't part of the OP's question. If you have a more optimal approach, why not just post it rather than harp on other's answers? – Nathan Dec 10 '21 at 23:10
27

While I adore the grep solution for its elegance and for reminding (or teaching) me about a method in Enumerable that I'd forgotten (or overlooked completely), it's slow, slow, slow. I agree 100% that creating the Array#mode method is a good idea, however - this is Ruby, we don't need a library of functions that act on arrays, we can create a mixin that adds the necessary functions into the Array class itself.

But the inject(Hash) alternative uses a sort, which we also don't really need: we just want the value with the highest occurrence.

Neither of the solutions address the possibility that more than one value may be the mode. Maybe that's not an issue in the problem as stated (can't tell). I think I'd want to know if there was a tie, though, and anyway, I think we can improve a little on the performance.

require 'benchmark'

class Array
  def mode1
    sort_by {|i| grep(i).length }.last
  end
  def mode2
    freq = inject(Hash.new(0)) { |h,v| h[v] += 1; h }
    sort_by { |v| freq[v] }.last    
  end
  def mode3
    freq = inject(Hash.new(0)) { |h,v| h[v] += 1; h }
    max = freq.values.max                   # we're only interested in the key(s) with the highest frequency
    freq.select { |k, f| f == max }         # extract the keys that have the max frequency
  end
end

arr = Array.new(1_000) { |i| rand(100) }    # something to test with

Benchmark.bm(30) do |r|
  res = {}
  (1..3).each do |i|
    m = "mode#{i}"
    r.report(m) do
      100.times do
        res[m] = arr.send(m).inspect
      end
    end
  end
  res.each { |k, v| puts "%10s = %s" % [k, v] }
end

And here's output from a sample run:

                                user     system      total        real
mode1                          34.375000   0.000000  34.375000 ( 34.393000)
mode2                           0.359000   0.000000   0.359000 (  0.359000)
mode3                           0.219000   0.000000   0.219000 (  0.219000)
     mode1 = 41
     mode2 = 41
     mode3 = [[41, 17], [80, 17], [72, 17]]

The "optimised" mode3 took 60% of the time of the previous record-holder. Note also the multiple highest-frequency entries.


A few months down the line, I noticed Nilesh's answer, which offered this:

def mode4
  group_by{|i| i}.max{|x,y| x[1].length <=> y[1].length}[0]
end

It doesn't work with 1.8.6 out of the box, because that version doesn't have Array#group_by. ActiveSupport has it, for the Rails developers, although it seems about 2-3% slower than mode3 above. Using the (excellent) backports gem, though, produces a 10-12% gain, as well as delivering a whole pile of 1.8.7 and 1.9 extras.

The above applies to 1.8.6 only - and mainly only if installed on Windows. Since I have it installed, here's what you get from IronRuby 1.0 (on .NET 4.0):

==========================   IronRuby   =====================================
(iterations bumped to **1000**)    user     system      total        real
mode1 (I didn't bother :-))
mode2                           4.265625   0.046875   4.312500 (  4.203151)
mode3                           0.828125   0.000000   0.828125 (  0.781255)
mode4                           1.203125   0.000000   1.203125 (  1.062507)

So in the event that performance is super-critical, benchmark the options on your Ruby version and OS. YMMV.

the Tin Man
  • 158,662
  • 42
  • 215
  • 303
Mike Woodhouse
  • 51,832
  • 12
  • 88
  • 127
13

I found a faster method. Try this:

  class Array
    def mode4
      group_by{|i| i}.max{|x,y| x[1].length <=> y[1].length}[0]
    end
  end

The Benchmark output:

                                    user     system      total        real
mode1                          24.340000   0.070000  24.410000 ( 24.526991)
mode2                           0.200000   0.000000   0.200000 (  0.195348)
mode3                           0.120000   0.000000   0.120000 (  0.118200)
mode4                           0.050000   0.010000   0.060000 (  0.056315)
     mode1 = 76
     mode2 = 76
     mode3 = [[76, 18]]
     mode4 = 76
the Tin Man
  • 158,662
  • 42
  • 215
  • 303
Nilesh C
  • 697
  • 2
  • 7
  • 17
11
arr = [ 1, 3, 44, 3 ]
most_frequent_item = arr.uniq.max_by{ |i| arr.count( i ) }
puts most_frequent_item
#=> 3

No need to even think about frequency mappings.

etringer
  • 111
  • 1
  • 3
8

This is a duplicate of this question "Ruby - Unique elements in Array".

Here is that question's solution:

group_by { |n| n }.values.max_by(&:size).first

That version seems to be even faster than Nilesh C's answer. Here is the code I used to benchmark it (OS X 10.6 Core 2 2.4GHz MB).

Kudos to Mike Woodhouse for the (original) benchmarking code:

class Array
   def mode1
     group_by { |n| n }.values.max_by(&:size).first
   end
   def mode2
     freq = inject(Hash.new(0)) { |h,v| h[v] += 1; h }
     max = freq.values.max                   # we're only interested in the key(s) with the highest frequency
     freq.select { |k, f| f == max }         # extract the keys that have the max frequency
   end
end

arr = Array.new(1_0000) { |i| rand(100000) }    # something to test with

Benchmark.bm(30) do |r|
    (1..2).each do |i| r.report("mode#{i}") { 100.times do arr.send("mode#{i}").inspect; end }; end
end

And here are the results of the benchmark:

                                user     system      total        real
mode1                           1.830000   0.010000   1.840000 (  1.876642)
mode2                           2.280000   0.010000   2.290000 (  2.382117)
 mode1 = 70099
 mode2 = [[70099, 3], [70102, 3], [51694, 3], [49685, 3], [38410, 3], [90815, 3], [30551, 3], [34720, 3], [58373, 3]]

As you can see, this version is about 20% faster with the caveat of ignoring ties. I also like the succinctness, I personally use it as-is without monkey patching all over the place. :)

the Tin Man
  • 158,662
  • 42
  • 215
  • 303
Brandon
  • 1,956
  • 18
  • 18
7

Ruby versions >= 2.7 will have Enumerable#tally

Tallys the collection. Returns a hash where the keys are the elements and the values are numbers of elements in the collection that correspond to the key.

So, you can do

[1, 1, 1, 2, 3].tally
# => {1=>3, 2=>1, 3=>1} 
Santhosh
  • 28,097
  • 9
  • 82
  • 87
3

if you are trying to avoid learning #inject (which you should not do...)

words = ['cat', 'dog', 'snake', 'dog']
count = Hash.new(0)

words.each {|word| count[word] += 1}
count.sort_by { |k,v| v }.last

but if I read this answer before, now I would know nothing about #inject and man, you need to know about #inject.

Redoman
  • 3,059
  • 3
  • 34
  • 62
2
idx = {}
[2,2,1,3,1].each { |i| idx.include?(i) ? idx[i] += 1 : idx[i] = 1}

This is just a simple indexer. You could replace the [2,2,1..] array with any sort of symbol/string based identifier, this wouldn't work with objects, you'd need to introduce a bit more complexity, but this is simple enough.

rereading your questions, this solution is a bit over-engineered since its going to return you an index of all occurrences, not just the one with the most.

Derek P.
  • 1,569
  • 10
  • 19
2

Here's another version that does give you the ties as a mode should:

def mode
  group_by {|x| x}.group_by {|k,v| v.size}.sort.last.last.map(&:first)
end

In other words, group the values, then group those kv pairs by the number of values, then sort those kv pairs, take the last (highest) size-group, and then unwind its values. I like group_by.

glenn mcdonald
  • 15,290
  • 3
  • 35
  • 40
0
def mode(array)

    count = []  # Number of times element is repeated in array
    output = [] 
    array.compact!
    unique = array.uniq
    j=0

    unique.each do |i|
        count[j] = array.count(i)
        j+=1
    end
    k=0
    count.each do |i|
        output[k] = unique[k] if i == count.max
        k+=1
    end  

    return output.compact.inspect
end

p mode([3,3,4,5]) #=> [3]

p mode([1,2,3]) #=> [1,2,3]

p mode([0,0,0,0,0,1,2,3,3,3,3,3]) #=> [0,3]

p mode([-1,-1,nil,nil,nil,0]) #=> [-1]

p mode([-2,-2,3,4,5,6,7,8,9,10,1000]) #=> [-2]
Pang
  • 9,564
  • 146
  • 81
  • 122