63
arr = [1,2,1,3,5,2,4]

How can I count the array by group value with sorting? I need the following output:

x[1] = 2  
x[2] = 2  
x[3] = 1  
x[4] = 1  
x[5] = 1
sawa
  • 165,429
  • 45
  • 277
  • 381
Mr. Black
  • 11,692
  • 13
  • 60
  • 85

11 Answers11

127
x = arr.inject(Hash.new(0)) { |h, e| h[e] += 1 ; h }
Michael Kohl
  • 66,324
  • 14
  • 138
  • 158
  • Many thanks michael and Terw. I like this very short. But, can you please briefly explain the above short line. :). – Mr. Black Mar 29 '11 at 10:24
  • 2
    `inject` "injects" an accumulator into an Enumerable, which in our case is a Hash with a default value of `0`. On every iteration, we add one to the value with the key of the current element (`e`). Finally we return the accumulator. http://www.ruby-doc.org/core/classes/Enumerable.html#M001494 – Michael Kohl Mar 29 '11 at 10:30
  • The "inject" operation is often called "fold" in functional programming languages, which I think is a more intuitive name. – JesperE Mar 29 '11 at 20:27
  • But that code doesn't sort hash. So in the end it's need more: Hash[#code here#.sort] or even sort_by – Dmitry Polushkin Jan 24 '12 at 08:28
  • 3
    prefer `.each_with_object` over `inject` when building hashes versus arithmetic. See @sawa's answer below. – Volte Sep 16 '15 at 21:27
  • "Inject" is also known as reduce. I think that makes it really clear. – Darth Egregious Dec 03 '18 at 04:54
  • This *is* a *loop*. – nroose Jul 09 '20 at 19:43
52

There is a short version which is in ruby 2.7 => Enumerable#tally.

[1,2,1,3,5,2,4].tally  #=> { 1=>2, 2=>2, 3=>1, 5=>1, 4=>1 }

# Other possible usage

(1..6).tally { |i| i%3 }   #=> { 0=>2, 1=>2, 2=>2 }

Mr. Black
  • 11,692
  • 13
  • 60
  • 85
  • 1
    `tally` doesn't accept block in 2.7 https://docs.ruby-lang.org/en/2.7.0/Enumerable.html#method-i-tally – mikdiet Mar 30 '20 at 07:00
38

Only available under ruby 1.9

Basically the same as Michael's answer, but a slightly shorter way:

x = arr.each_with_object(Hash.new(0)) {|e, h| h[e] += 1}

In similar situations,

  • When the starting element is a mutable object such as an Array, Hash, String, you can use each_with_object, as in the case above.
  • When the starting element is an immutable object such as Numeric, you have to use inject as below.

    sum = (1..10).inject(0) {|sum, n| sum + n} # => 55

Community
  • 1
  • 1
sawa
  • 165,429
  • 45
  • 277
  • 381
  • 4
    In terms of characters, it's longer. In terms of tokens, it's shorter. Thanks for comment. – sawa Mar 29 '11 at 20:26
  • Thanks @sawa. Absolutely it's very shorter and faster. Because, my actual array is mutable format and it holds a very large amount of data. thanks once again. – Mr. Black Mar 30 '11 at 03:44
  • Though I've noticed with this approach that the values isn't in sorted order like the answer said. – Hengjie Oct 07 '14 at 01:21
  • This is the cleanest answer. `each_with_object` has been added to avoid `h[e] += 1 ; h` – Eric Duminil Dec 06 '16 at 12:38
21

Yet another - similar to others - approach:

result=Hash[arr.group_by{|x|x}.map{|k,v| [k,v.size]}]
  1. Group by each element's value.
  2. Map the grouping to an array of [value, counter] pairs.
  3. Turn the array of paris into key-values within a Hash, i.e. accessible via result[1]=2 ....
lllllll
  • 4,715
  • 6
  • 29
  • 42
20
arr.group_by(&:itself).transform_values(&:size)
#=> {1=>2, 2=>2, 3=>1, 5=>1, 4=>1}
EliadL
  • 6,230
  • 2
  • 26
  • 43
16

Whenever you find someone asserting that something is the fastest on this type of primitive routine, I always find its interesting to confirm that because without confirmation most of us are really just guessing. So I took all of the methods here and benchmarked them.

I took an array of 120 links I extracted from a web page that I needed to group by count and implemented all of these using a seconds = Benchmark.realtime do loop and got all the times.

Assume links is the name of the array I need to count:

#0.00077
seconds = Benchmark.realtime do
  counted_links = {}
  links.each { |e| counted_links[e] = links.count(e) if counted_links[e].nil?}
end
seconds

#0.000232
seconds = Benchmark.realtime do
  counted_links = {}
  links.sort.group_by {|x|x}.each{|x,y| counted_links[x] = y.size}
end

#0.00076
seconds = Benchmark.realtime do 
  Hash[links.uniq.map{ |i| [i, links.count(i)] }]
end

#0.000107 
seconds = Benchmark.realtime do 
  links.inject(Hash.new(0)) {|h, v| h[v] += 1; h}
end

#0.000109
seconds = Benchmark.realtime do 
  links.each_with_object(Hash.new(0)) {|e, h| h[e] += 1}
end

#0.000143
seconds = Benchmark.realtime do 
  links.inject(Hash.new(0)) { |h, e| h[e] += 1 ; h }
end

And then a little bit of ruby to figure out the answer:

times = [0.00077, 0.000232, 0.00076, 0.000107, 0.000109, 0.000143].min
==> 0.000107

So the actual fastest method, ymmv of course, is:

links.inject(Hash.new(0)) {|h, v| h[v] += 1; h}
fuzzygroup
  • 1,109
  • 12
  • 12
11
x = Hash[arr.uniq.map{ |i| [i, arr.count(i)] }]

Latest Ruby has to_h method:

x = arr.uniq.map{ |i| [i, arr.count(i)] }.to_h
rubyprince
  • 17,559
  • 11
  • 64
  • 104
  • Michael Kohl beat me, but he's code should be faster. This code takes about twice as long – ThoKra Mar 29 '11 at 10:18
  • @fl00r..that is interesting..I thought this would be slower as it loops through and then again use `count` method on the array. Maybe using built in methods has their advantage. :) – rubyprince Mar 29 '11 at 10:29
  • @fl00r: Really? I originally had a version using `count`, but thought it wouldn't scale well with array length, so replaced it by my current answer. Can you run your benchmark with somewhat bigger array and compare again. – Michael Kohl Mar 29 '11 at 10:39
  • Not really. I was wrong. As far as this is `O(n2)` it is faster in benchmarks with small arrays, but it will increadibly slow with big arrays. My fault is I was testing present array in million cycle bench - so it was 20% faster. – fl00r Mar 29 '11 at 10:49
  • @fl00r..yeah..definitely this would be slow for larger arrays. – rubyprince Mar 29 '11 at 11:20
6

Just for the record, I recently read about Object#tap here. My solution would be:

Hash.new(0).tap{|h| arr.each{|i| h[i] += 1}}

The #tap method passes the caller to the block and then returns it. This is pretty handy when you have to incrementally build an array/hash.

erasing
  • 526
  • 3
  • 5
6

I am sure there are better ways,

>> arr.sort.group_by {|x|x}.each{|x,y| print "#{x} #{y.size}\n"}
1 2
2 2
3 1
4 1
5 1

assign x and y values to a hash as needed.

kurumi
  • 25,121
  • 5
  • 44
  • 52
  • it is not necessary to `sort` before `group_by`. `arr.group_by {...}` will do the same thing – user102008 Aug 25 '11 at 21:37
  • @user102008 The OP implied the results are to be presented in order. Not `[2,1].group_by {|x|x} #=> {2=>[2], 1=>[1]}` kurumi, what ways are better? – Cary Swoveland Oct 27 '14 at 19:58
  • @CarySwoveland: `group_by` returns a `Hash` which has no order. The order in which entries in a `Hash` is iterated is unpredictable. – user102008 Oct 27 '14 at 20:04
  • @user102008, `group_by` [preserves order](http://stackoverflow.com/questions/24378914/does-enumerables-group-by-preiserve-the-enumerables-order), at least in MRI 1.9+. AFAIK, it is not documented, but should be, as it's part of the spec. – Cary Swoveland Oct 27 '14 at 20:14
  • @CarySwoveland: The values corresponding to each key have an order; sorting might be relevant if you cared about that. But there is no order among the keys, for the very fact that the returned value is a `Hash`. So it would NOT have anything to do with `[2,1].group_by {|x|x} #=> {2=>[2], 1=>[1]}` – user102008 Oct 27 '14 at 23:19
  • I assume you are aware that recent versions of Ruby maintain insertion order. As Dave Thompson puts it (Pickaxe, p71): "And, as of Ruby 1.9, you’ll find something that might be surprising: Ruby remembers the order in which you add items to a hash. When you subsequently iterate over the entries, Ruby will return them in that order. Assuming `Array.each` has not been redefined, I would think the reference I cited earlier (which gives the source code and spec for `group_by`) is definitive. – Cary Swoveland Oct 28 '14 at 06:20
5

This should do it

arr = [1,2,1,3,5,2,4]

puts arr.inject(Hash.new(0)) {|h, v| h[v] += 1; h}
#=> {1=>2, 2=>2, 3=>1, 5=>1, 4=>1}
ThoKra
  • 2,959
  • 2
  • 27
  • 38
2
arr = [1,2,1,3,5,2,4]
r = {}
arr.each { |e| r[e] = arr.count(e) if r[e].nil?}

Outputs

p r
#==> {1=>2, 2=>2, 3=>1, 5=>1, 4=>1}
thebugfinder
  • 324
  • 1
  • 9