38

I have array of words and I want to get a hash, where keys are words and values are word count.

Is there any more beautiful way then my:

result = Hash.new(0)
words.each { |word| result[word] += 1 }
return result
ceth
  • 44,198
  • 62
  • 180
  • 289

5 Answers5

60

The imperative approach you used is probably the fastest implementation in Ruby. With a bit of refactoring, you can write a one-liner:

wf = Hash.new(0).tap { |h| words.each { |word| h[word] += 1 } }

Another imperative approach using Enumerable#each_with_object:

wf = words.each_with_object(Hash.new(0)) { |word, acc| acc[word] += 1 }

A functional/immutable approach using existing abstractions:

wf = words.group_by(&:itself).map { |w, ws| [w, ws.length] }.to_h

Note that this is still O(n) in time, but it traverses the collection three times and creates two intermediate objects along the way.

Finally: a frequency counter/histogram is a common abstraction that you'll find in some libraries like Facets: Enumerable#frequency.

require 'facets'
wf = words.frequency
tokland
  • 66,169
  • 13
  • 144
  • 170
  • May be simply, `str.split(" ").reduce(Hash.new(0)) { |h,w| puts h[w] += 1; h }`? – Bharath Mg Apr 24 '14 at 19:43
  • 1
    Some pinch-of-salt speed testing, ruby 2.0.0p451 on a macbook running mavericks: Declarative: `100.times { words.inject(Hash.new 0) { |h, w| h[w] += 1; h } }`: avg 1.17s. Imperative: `100.times { hist = Hash.new 0; words.each { |w| hist[w] += 1 } }`: avg 1.09s. `words` was an array of 10k random words, generation of the array alone took 0.2s avg. i.e. Imperative was about 9% faster. – Benji XVI May 27 '14 at 10:57
  • 1
    Thank you for the last note about Facets. I've re-implemented this several times now, and facets saves me the trouble of re-doing it or starting my own standard lib. For others, you should check out Facets, it's like an extension of Ruby's standard library. – Eric Hu Jul 13 '16 at 09:41
  • 1
    Great answer. I prefer the readability of `group_by(&:itself)` – Marc-André Lafortune May 17 '17 at 18:43
  • 1
    Also, `each_with_object` fits better here than `reduce` IMO. – Marc-André Lafortune May 17 '17 at 18:46
10

Posted on a related question, but posting here for visibility as well:

Ruby 2.7 onwards will have the Enumerable#tally method that will solve this.

From the trunk documentation:

Tallys the collection. Returns a hash where the keys are the elements and the values are numbers of elements in the collection that correspond to the key.

["a", "b", "c", "b"].tally #=> {"a"=>1, "b"=>2, "c"=>1}
Pawan
  • 1,480
  • 2
  • 18
  • 27
7

With inject:

str = 'I have array of words and I want to get a hash, where keys are words'
result = str.split.inject(Hash.new(0)) { |h,v| h[v] += 1; h }

=> {"I"=>2, "have"=>1, "array"=>1, "of"=>1, "words"=>2, "and"=>1, "want"=>1, "to"=>1, "get"=>1, "a"=>1, "hash,"=>1, "where"=>1, "keys"=>1, "are"=>1}

I don't know about the efficiency.

Baldrick
  • 23,882
  • 6
  • 74
  • 79
  • 1
    According to doc of the facets method posted by tokland, `inject` is a slower. – Baldrick Feb 28 '12 at 11:35
  • 1
    Also, if you use `inject` and you need to return the object at the end of the block like above (`; h`), you should use `each_with_object` instead. – mfilej Sep 24 '13 at 10:12
2
irb(main):001:0> %w(foo bar foo bar).each_with_object(Hash.new(0)) { |w, m| m[w] += 1 }
=> {"foo"=>2, "bar"=>2}

as @mfilej said

Boris Lopez
  • 516
  • 8
  • 14
1

This one is elegant:

  words.group_by(&:itself).transform_values(&:count)

fanjieqi
  • 472
  • 4
  • 11