Here is one way to build a hash whose keys are words and whose values are arrays of hashes with keys :title
and :count
, the hashes ordered by decreasing value of count
.
Code
I am assuming we will start with a hash books
, whose keys are titles and whose values are all the text in the book in one string.
def word_count_hash(books)
word_and_count_by_title = books.each_with_object({}) { |(title,words),h|
h[title] = words.scan(/\w+/)
.map(&:downcase)
.each_with_object({}) { |w,g| g[w] = (g[w] || 0)+1 } }
title_and_count_by_word = word_and_count_by_title
.each_with_object({}) { |(title,words),g| words.each { |w,count|
g.update({w =>[{title: title, count: count}]}){|_,oarr,narr|oarr+narr}}}
title_and_count_by_word.keys.each { |w| g[w].sort_by! { |h| -h[:count] } }
title_and_count_by_word
end
Example
books = {}
books["Grapes of Wrath"] =
<<_
To the red country and part of the gray country of Oklahoma, the last rains
came gently, and they did not cut the scarred earth. The plows crossed and
recrossed the rivulet marks. The last rains lifted the corn quickly and
scattered weed colonies and grass along the sides of the roads so that the
gray country and the dark red country began to disappear under a green cover.
_
books["Tale of Two Cities"] =
<<_
It was the best of times, it was the worst of times, it was the age of wisdom,
it was the age of foolishness, it was the epoch of belief, it was the epoch of
incredulity, it was the season of Light, it was the season of Darkness, it was
the spring of hope, it was the winter of despair, we had everything before us,
we had nothing before us, we were all going direct to Heaven, we were all
going direct the other way
_
books["Moby Dick"] =
<<_
Call me Ishmael. Some years ago - never mind how long precisely - having little
or no money in my purse, and nothing particular to interest me on shore, I
thought I would sail about a little and see the watery part of the world. It is
a way I have of driving off the spleen and regulating the circulation. Whenever
I find myself growing grim about the mouth; whenever it is a damp, drizzly
November in my soul; whenever I find myself involuntarily pausing before coffin
warehouses
_
Construct the hash:
title_and_count_by_word = word_count_hash(books)
and then look up words:
title_and_count_by_word["the"]
#=> [{:title=>"Grapes of Wrath", :count=>12},
# {:title=>"Tale of Two Cities", :count=>11},
# {:title=>"Moby Dick", :count=>5}]
title_and_count_by_word["to"]
#=> [{:title=>"Grapes of Wrath", :count=>2},
# {:title=>"Tale of Two Cities", :count=>1},
# {:title=>"Moby Dick", :count=>1}]
Note the words being looked up must be entered in (or converted to) lower case.
Explanation
Construct the first hash:
word_and_count_by_title = books.each_with_object({}) { |(title,words),h|
h[title] = words.scan(/\w+/)
.map(&:downcase)
.each_with_object({}) { |w,g| g[w] = (g[w] || 0)+1 } }
#=> {"Grapes of Wrath"=>
# {"to"=>2, "the"=>12, "red"=>2, "country"=>4, "and"=>6, "part"=>1,
# ...
# "disappear"=>1, "under"=>1, "a"=>1, "green"=>1, "cover"=>1},
# "Tale of Two Cities"=>
# {"it"=>10, "was"=>10, "the"=>11, "best"=>1, "of"=>10, "times"=>2,
# ...
# "going"=>2, "direct"=>2, "to"=>1, "heaven"=>1, "other"=>1, "way"=>1},
# "Moby Dick"=>
# {"call"=>1, "me"=>2, "ishmael"=>1, "some"=>1, "years"=>1, "ago"=>1,
# ...
# "pausing"=>1, "before"=>1, "coffin"=>1, "warehouses"=>1}}
To see what's happening here, consider the first element of books
that Enumerable#each_with_object passes into the block. The two block variables are assigned the following values:
title
#=> "Grapes of Wrath"
words
#=> "To the red country and part of the gray country of Oklahoma, the
# last rains came gently,\nand they did not cut the scarred earth.
# ...
# the dark red country began to disappear\nunder a green cover.\n"
each_with_object
has created a hash represented by the block variable h
, which is initially empty.
First construct an array of words and convert each to lower-case.
q = words.scan(/\w+/).map(&:downcase)
#=> ["to", "the", "red", "country", "and", "part", "of", "the", "gray",
# ...
# "began", "to", "disappear", "under", "a", "green", "cover"]
We may now create a hash that contains a count of each word for the title "Grapes of Wrath":
h[title] = q.each_with_object({}) { |w,g| g[w] = (g[w] || 0) + 1 }
#=> {"to"=>2, "the"=>12, "red"=>2, "country"=>4, "and"=>6, "part"=>1,
# ...
# "disappear"=>1, "under"=>1, "a"=>1, "green"=>1, "cover"=>1}
Note the expression
g[w] = (g[w] || 0) + 1
If the hash g
already has a key for the word w
, this expression is equivalent to
g[w] = g[w] + 1
On the other hand, if g
does not have this key (word) (in which case g[w] => nil
), then the expression is eqivalent to
g[w] = 0 + 1
The same calculations are then performed for each of the other two books.
We can now construct the second hash.
title_and_count_by_word =
word_and_count_by_title.each_with_object({}) { |(title,words),g|
words.each { |w,count| g.update({ w => [{title: title, count: count}]}) \
{ |_, oarr, narr| oarr + narr } } }
#=> {"to" => [{:title=>"Grapes of Wrath", :count=>2},
# {:title=>"Tale of Two Cities", :count=>1},
# {:title=>"Moby Dick", :count=>1}],
#=> "the" => [{:title=>"Grapes of Wrath", :count=>12},
# {:title=>"Tale of Two Cities", :count=>11},
# {:title=>"Moby Dick", :count=>5}],
# ...
# "warehouses"=> [{:title=>"Moby Dick", :count=>1}]}
(Note that this operation does not order the hashes for each word by :count
, even though that may appear to be the case in this output fragment. The hashes are sorted in the next and final step.)
The main operation here that requires explanation is Hash#update (aka Hash#merge!
). We are building a hash denoted by the block variable g
, which initially is empty. The keys of this hash are words, the values are hashes with keys :title
and :count
. Whenever the hash being merged has a key (word) that is already a key of g
, the block
{ |_, oarr, narr| oarr + narr }
is called to determine the value for the key in the merged hash. The block variables here are the key (word) (which we have replaced with an underscore because it will not be used), the old array of hashes and the new array of hashes to be merged (of which there is just one). We simply add the new hash to merged array of hashes.
Lastly we sort the values of the hash (which are arrays of hashes) on decreasing value of :count
.
title_and_count_by_word.keys.each { |w| g[w].sort_by! { |h| -h[:count] } }
title_and_count_by_word
#=> {"to"=>
# [{:title=>"Grapes of Wrath", :count=>2},
# {:title=>"Tale of Two Cities", :count=>1},
# {:title=>"Moby Dick", :count=>1}],
# "the"=>
# [{:title=>"Grapes of Wrath", :count=>12},
# {:title=>"Tale of Two Cities", :count=>11},
# {:title=>"Moby Dick", :count=>5}],
# ...
# "warehouses"=>[{:title=>"Moby Dick", :count=>1}]}