4

I'm going through ruby koans and I am having a little trouble understanding when this code will be run:

hash = Hash.new {|hash, key| hash[key] = [] }

If there are no values in the hash, when does the new array get assigned to a given key in the Hash? Does it happen the first time a hash value is accessed without first assigning it? Please help me understand when exactly default values are created for any given hash key.

3 Answers3

10

For the benefit of those new to Ruby, I have discussed alternative approaches to the problem, including the one that is the substance of this question.

The task

Suppose you are given an array

arr = [[:dog, "fido"], [:car, "audi"], [:cat, "lucy"], [:dog, "diva"], [:cat, "bo"]]  

and wish to to create the hash

{ :dog=>["fido", "diva"], :car=>["audi"], :cat=>["lucy", "bo"] }

First solution

h = {}
arr.each do |k,v|
  h[k] = [] unless h.key?(k)
  h[k] << v
end
h #=> {:dog=>["fido", "diva"], :car=>["audi"], :cat=>["lucy", "bo"]}

This is quite straightforward.

Second solution

More Ruby-like is to write:

h = {}
arr.each { |k,v| (h[k] ||= []) << v }
h #=> {:dog=>["fido", "diva"], :car=>["audi"], :cat=>["lucy", "bo"]}

When Ruby sees (h[k] ||= []) << v the first thing she does is expand it to

(h[k] = h[k] || []) << v

If h does not have a key k, h[k] #=> nil, so the expression becomes

(h[k] = nil || []) << v

which becomes

(h[k] = []) << v

so

h[k] #=> [v]

Note that h[k] on the left of equality uses the method Hash#[]=, whereas h[k] on the right employs Hash#[].

This solution requires that none of the hash values equal nil.

Third solution

A third approach is to give the hash a default value. If a hash h does not have a key k, h[k] returns the default value. There are two types of default values.

Passing the default value as an argument to Hash::new

If an empty array is passed as an argument to Hash::new, that value becomes the default value:

a = []
a.object_id
  #=> 70339916855860
g = Hash.new(a)
  #=> {}

g[k] returns [] when h does not have a key k. (The hash is not altered, however.) This construct has important uses, but it is inappropriate here. To see why, suppose we write

x = g[:cat] << "bo"
  #=> ["bo"] 
y = g[:dog] << "diva"
  #=> ["bo", "diva"] 
x #=> ["bo", "diva"]

This is because the values of :cat and :dog are both set equal to the same object, an empty array. We can see this by examining object_ids:

x.object_id
  #=> 70339916855860 
y.object_id
  #=> 70339916855860 

Giving Hash::new a block which returns the default value

The second form of default value is to perform a block calculation. If we define the hash with a block:

h = Hash.new { |h,k| h[key] = [] }

then if h does not have a key k, h[k] will be set equal to the value returned by the block, in this case an empty array. Note that the block variable h is the newly-created empty hash. This allows us to write

h = Hash.new { |h,k| h[k] = [] }
arr.each { |k,v| h[k] << v }
h #=> {:dog=>["fido", "diva"], :car=>["audi"], :cat=>["lucy", "bo"]}

As the first element passed to the block is arr.first, the block variables are assigned values by evaluating

k, v = arr.first
  #=> [:dog, "fido"] 
k #=> :dog 
v #=> "fido" 

The block calculation is therefore

h[k] << v
  #=> h[:dog] << "fido"

but since h does not (yet) have a key :dog, the block is triggered, setting h[k] equal to [] and then that empty array is appended with "fido", so that

h #=> { :dog=>["fido"] }

Similarly, after the next two elements of arr are passed to the block we have

h #=> { :dog=>["fido"], :car=>["audi"], :cat=>["lucy"] }

When the next (fourth) element of arr is passed to the block, we evaluate

h[:dog] << "diva"

but now h does have a key, so the default does not apply and we end up with

h #=> {:dog=>["fido", "diva"], :car=>["audi"], :cat=>["lucy"]} 

The last element of arr is processed similarly.

Note that, when using Hash::new with a block, we could write something like this:

h = Hash.new { launch_missiles("any time now") }

in which case h[k] would be set equal to the return value of launch_missiles. In other words, anything can be done in the block.

Even more Ruby-like

Lastly, the more Ruby-like way of writing

h = Hash.new { |h,k| h[k] = [] }
arr.each { |k,v| h[k] << v }
h #=> {:dog=>["fido", "diva"], :car=>["audi"], :cat=>["lucy", "bo"]}

is to use Enumerable#each_with_object:

arr.each_with_object(Hash.new { |h,k| h[k] = [] }) { |k,v| h[k] << v }
  #=> {:dog=>["fido", "diva"], :car=>["audi"], :cat=>["lucy", "bo"]}

which eliminates two lines of code.

Which is best?

Personally, I am indifferent to the second and third solutions. Both are used in practice.

Cary Swoveland
  • 106,649
  • 6
  • 63
  • 100
1

The block is called when you add a new key to the hash. In that specific case:

hash["d"] #calls the block and store [] as a value of "d" key
hash["d"] #should print []

For more information, visit: https://docs.ruby-lang.org/en/2.0.0/Hash.html

If a block is specified, it will be called with the hash object and the key, and should return the default value. It is the block's responsibility to store the value in the hash if required.

vl3
  • 36
  • 1
  • 5
0

Makes life easier

This is syntactic sugar for those times that you have a hash whose values are all arrays and you don't want to check each time to see if the hash key is already there and the empty array is already initialized before adding new elements. It allows this:

hash[:new_key] << new_element

instead of this:

hash[:new_key] = [] unless hash[:new_key] 
hash[:new_key] << new_element

Solves an older problem

It's also an alternative to the simpler way of specifying a default value for hashes, which looks like this:

hash = Hash.new([])

The problem with this approach is that the same array object is used as the default for all keys. So

hash = Hash.new([])
hash[:a] << 1
hash[:b] << 2

will return [1, 2] for either hash[:a] or hash[:b], or even hash[:foo] for that matter. Which is not usually the desired/expected behavior.

Scott Schupbach
  • 1,284
  • 9
  • 21