For the benefit of those new to Ruby, I have discussed alternative approaches to the problem, including the one that is the substance of this question.
The task
Suppose you are given an array
arr = [[:dog, "fido"], [:car, "audi"], [:cat, "lucy"], [:dog, "diva"], [:cat, "bo"]]
and wish to to create the hash
{ :dog=>["fido", "diva"], :car=>["audi"], :cat=>["lucy", "bo"] }
First solution
h = {}
arr.each do |k,v|
h[k] = [] unless h.key?(k)
h[k] << v
end
h #=> {:dog=>["fido", "diva"], :car=>["audi"], :cat=>["lucy", "bo"]}
This is quite straightforward.
Second solution
More Ruby-like is to write:
h = {}
arr.each { |k,v| (h[k] ||= []) << v }
h #=> {:dog=>["fido", "diva"], :car=>["audi"], :cat=>["lucy", "bo"]}
When Ruby sees (h[k] ||= []) << v
the first thing she does is expand it to
(h[k] = h[k] || []) << v
If h
does not have a key k
, h[k] #=> nil
, so the expression becomes
(h[k] = nil || []) << v
which becomes
(h[k] = []) << v
so
h[k] #=> [v]
Note that h[k]
on the left of equality uses the method Hash#[]=, whereas h[k]
on the right employs Hash#[].
This solution requires that none of the hash values equal nil
.
Third solution
A third approach is to give the hash a default value. If a hash h
does not have a key k
, h[k]
returns the default value. There are two types of default values.
Passing the default value as an argument to Hash::new
If an empty array is passed as an argument to Hash::new
, that value becomes the default value:
a = []
a.object_id
#=> 70339916855860
g = Hash.new(a)
#=> {}
g[k]
returns []
when h
does not have a key k
. (The hash is not altered, however.) This construct has important uses, but it is inappropriate here. To see why, suppose we write
x = g[:cat] << "bo"
#=> ["bo"]
y = g[:dog] << "diva"
#=> ["bo", "diva"]
x #=> ["bo", "diva"]
This is because the values of :cat
and :dog
are both set equal to the same object, an empty array. We can see this by examining object_id
s:
x.object_id
#=> 70339916855860
y.object_id
#=> 70339916855860
Giving Hash::new
a block which returns the default value
The second form of default value is to perform a block calculation. If we define the hash with a block:
h = Hash.new { |h,k| h[key] = [] }
then if h
does not have a key k
, h[k]
will be set equal to the value returned by the block, in this case an empty array. Note that the block variable h
is the newly-created empty hash. This allows us to write
h = Hash.new { |h,k| h[k] = [] }
arr.each { |k,v| h[k] << v }
h #=> {:dog=>["fido", "diva"], :car=>["audi"], :cat=>["lucy", "bo"]}
As the first element passed to the block is arr.first
, the block variables are assigned values by evaluating
k, v = arr.first
#=> [:dog, "fido"]
k #=> :dog
v #=> "fido"
The block calculation is therefore
h[k] << v
#=> h[:dog] << "fido"
but since h
does not (yet) have a key :dog
, the block is triggered, setting h[k]
equal to []
and then that empty array is appended with "fido", so that
h #=> { :dog=>["fido"] }
Similarly, after the next two elements of arr
are passed to the block we have
h #=> { :dog=>["fido"], :car=>["audi"], :cat=>["lucy"] }
When the next (fourth) element of arr
is passed to the block, we evaluate
h[:dog] << "diva"
but now h
does have a key, so the default does not apply and we end up with
h #=> {:dog=>["fido", "diva"], :car=>["audi"], :cat=>["lucy"]}
The last element of arr
is processed similarly.
Note that, when using Hash::new with a block, we could write something like this:
h = Hash.new { launch_missiles("any time now") }
in which case h[k]
would be set equal to the return value of launch_missiles
. In other words, anything can be done in the block.
Even more Ruby-like
Lastly, the more Ruby-like way of writing
h = Hash.new { |h,k| h[k] = [] }
arr.each { |k,v| h[k] << v }
h #=> {:dog=>["fido", "diva"], :car=>["audi"], :cat=>["lucy", "bo"]}
is to use Enumerable#each_with_object:
arr.each_with_object(Hash.new { |h,k| h[k] = [] }) { |k,v| h[k] << v }
#=> {:dog=>["fido", "diva"], :car=>["audi"], :cat=>["lucy", "bo"]}
which eliminates two lines of code.
Which is best?
Personally, I am indifferent to the second and third solutions. Both are used in practice.