2

This is related to Ruby hash default value behavior

But maybe the explanation there doesn't include this part: it seems that Ruby's Hash default value are separate whether you "read it", or see "what is set"?

One example is:

foo = Hash.new([])

foo[123].push("hi")

p foo       # => {}
p foo[123]  # => ["hi"]
p foo       # => {}

How is it that foo[123] has a value, but foo is all empty, is somewhat beyond my comprehension... the only way I can understand it is that Ruby Hash keeps a separate list for the "read" or "getter", while somehow the "internal" assigned value are different.

If one of Ruby's design principles is "to have the least amount of surprise to the programmers", then the foo is empty but foo[123] is something, is somewhat in this case, a surprise to me.

(I haven't seen that in other languages actually... if there is a case where another language has similar behavior, maybe it is easier to make a connection.)

Community
  • 1
  • 1
nonopolarity
  • 146,324
  • 131
  • 460
  • 740

4 Answers4

3

Suppose `

h = Hash.new(:cat)
h[:a] = 1
h[:b] = 2
h #=> {:a=>1, :b=>2}

Now

h[:a] #=> 1
h[:b] #=> 2
h[:c] #=> :cat
h[:d] #=> :cat
h     #=> {:a=>1, :b=>2}

h = Hash.new(:cat) defines an empty hash h with a default value of :cat. This means that if h does not have a key k, h[k] will return :cat, nothing more, nothing less. As you can see above, executing h[k] does not change the hash when k is :c or :d.

On the other hand,

h[:c] = h[:c]
  #=> :c
h #=> {:a=>1, :b=>2, :c=>:cat}

Confused? Let me write this without the syntactic sugar:

h.[]=(:d, h.[](:d))
  #=> :cat
h #=> {:a=>1, :b=>2, :d=>:cat}

The default value is returned by h.[](:d) (i.e., h[:d]) whereas Hash#[]= is an assignment method (that takes two arguments, a key and a value) to which the default does not apply.

A common use of this default is to create a counting hash:

a = [1,3,1,4,2,5,4,4]
h = Hash.new(0)
a.each { |x| h[x] = h[x] + 1 }
h #=> {1=>2, 3=>1, 4=>3, 2=>1, 5=>1}

Initially, when h is empty and x #=> 1, h[1] = h[1] + 1 will evaluate to h[1] = 0 + 1, because (since h has no key 1) h[1] on the right side of the equality is set equal to the default value of zero. The next time 1 is passed to the block (x #=> 1), x[1] = x[1] + 1, which equals x[1] = 1 + 1. This time the default value is not used because h now has a key 1.

This would normally be written (incidentally):

a.each_with_object(Hash.new(0)) { |x,h| h[x] += 1 }
  #=> {1=>2, 3=>1, 4=>3, 2=>1, 5=>1}

One generally does not want the default value to be a collection, such as an array or hash. Consider the following:

h = Hash.new([])
[1,2,3].map { |n| h[n] = h[n] }
h #=> {1=>[], 2=>[], 3=>[]}

Now observe:

h[1] << 2
h #=> {1=>[2], 2=>[2], 3=>[2]}

This is normally not the desired behaviour. It has happened because

h.map { |k,v| v.object_id }
  #=> [25886508, 25886508, 25886508]

That is, all the values are the same object, so if the value of one key is changed the values of all other keys are changed as well.

The way around this is to use a block when defining the hash:

h = Hash.new { |h,k| h[k]=[] }
[1,2,3].each { |n| h[n] = h[n] }
h #=> {1=>[], 2=>[], 3=>[]}
h[1] << 2
h #=> {1=>[2], 2=>[], 3=>[]}
h.map { |k,v| v.object_id }
  #=> [24172884, 24172872, 24172848]

When the hash h does not have a key k the block { |h,k| h[k]=[] } is executed and returns an empty array specific to that key.

Cary Swoveland
  • 106,649
  • 6
  • 63
  • 100
  • so then, it looks like `h = Hash.new {|hsh, key| hsh[key] = [] }` is what would be desired... `h = Hash.new { [] }` doesn't have the effect so that `h[1].push(123)` and `h[2].push(456)` and then `p h` will show the hash with 2 keys and 2 arrays – nonopolarity Apr 30 '17 at 14:35
  • You are correct. Thanks. I edited. – Cary Swoveland Apr 30 '17 at 16:04
  • The [documentation](http://ruby-doc.org/core-2.4.1/Hash.html#method-c-new) could be a little clearer on this: "If _obj_ is specified, this single object will be used for all _default values_. If a block is specified, it will be called with the hash object and the key, and should return the default value. It is the block’s responsibility to store the value in the hash if required." The last sentence implies but doesn't explicitly specify how the default value works. – mu is too short Apr 30 '17 at 17:26
  • @muistooshort, I experienced a brain vapour-lock. (The block returns the object regardless of whether it's the last calculation, of course.) Deleted my earlier comments. Yes on `{ |h,k| [] }` (same as `{ [] }`), which in my example results in, for example, `[] << 1 #=> [1]`, but since `[1]` is not tied to the hash, it is merely garbage-collected and `{}` is returned by the block. Readers should take note that the block can also contain statements that perform actions that have nothing to do with computing a default value. – Cary Swoveland Apr 30 '17 at 20:29
2

The statement:

foo = Hash.new([])

creates a new Hash that has an empty array ([] as default value). The default value is the value returned by Hash::[] when its argument is not a key present in the hash.

The statement:

foo[123]

invokes Hash::[] and, because the hash is empty (the key 123 is not present in the hash), it returns a reference to the default value which is an object of type Array. The statement above doesn't create the 123 key in the hash.

Ruby objects are always passed and returned by reference. This means that the statement above doesn't return a copy of the default value of the hash but a reference to it.

The statement:

foo[123].push("hi")

modifies the above mentioned array. Now, the default value of the foo hash is not an empty array any more; it is the array ["hi"]. But the has is still empty; none of the above statements added some (key, value) pair to it.

How is it that foo[123] has a value

foo[123] doesn't have any value, the key 123 is not present in the hash (the hash is empty). A subsequent call to foo[123] returns a reference to the default value again and the default value now it's ["hi"]. And a call to foo[456] or foo['abc'] also returns a reference to the same default value.

axiac
  • 68,258
  • 9
  • 99
  • 134
  • so maybe in most situations, that really is not a desired behavior? who want to "morph" a default value? it has side effect and is dangerous – nonopolarity May 01 '17 at 07:17
  • The side effect comes from the incorrect interpretation of what the line `foo[123].push("hi")` does. If you want it to set the array `["hi"]` at key `123` in the `foo` hash then this approach is valid in PHP f.e. (with a different syntax) but not in Ruby. – axiac May 01 '17 at 07:21
  • I think the bottom line here is: in Ruby, the method [`Hash#[]`](https://ruby-doc.org/core-2.4.0/Hash.html#method-i-5B-5D) never creates an entry in the hash when a non-existing key is accessed (PHP, for example, does it in some contexts). You have to use [`Hash#[]=`](https://ruby-doc.org/core-2.4.0/Hash.html#method-i-5B-5D-3D) in order to create a new entry in the hash. And don't forget Ruby handles everything using references. – axiac May 01 '17 at 07:38
0

You didn't actually change the value of key 123, you're just accessing the default value [] you provided during initialization. You can confirm this if you inspect a different value like foo[0].

If you would do this:

foo[123] = ["hi"]

you could see the new entry, because you've created a new array under the key 123.

Edit

  • When you call foo[123].push("hi"), you're mutating the (default) value instead of adding a new entry.

  • Calling foo[123] += ["hi"] creates a new array under the given key, replacing the previous one if it existed, which will show the behavior you desire.

sudee
  • 773
  • 9
  • 14
  • I think the confusion comes from "mutating" or not mutating... it is not relevant. The key point is that it is **one instance of Array**, sitting there, so whenever the key doesn't exist, then this very same instance is returned... usually not what the programmer wants – nonopolarity Apr 30 '17 at 09:01
  • the usage `foo[123] += ["hi"]` works. However, if some other programmers actually did a `foo[:bar].push("hi")`, then it will start to have side effect for any other code that does `foo[123] += ["hi"]`. I really do not like side effect like this – nonopolarity Apr 30 '17 at 09:04
  • it is like a dangling default value (a dangling Array instance), subject to anybody's tampering... it seems dangerous – nonopolarity Apr 30 '17 at 09:07
  • It is the way hashes handle default value. If, during lookup, no such key exists, it will just return the single default value set, which is an array reference in this instance. – sudee Apr 30 '17 at 09:11
  • Works just the same way with integers or strings as well, only we usually create new integers and strings instead of mutating the previous value. – sudee Apr 30 '17 at 09:13
  • See the difference: `h = Hash.new("hello"); h[1] += " world"; h[2] << " ruby"` – sudee Apr 30 '17 at 09:15
0

Printing out the hash with:

p foo

only prints the values stored in the hash. It does not display the default value (or anything added to the default array).

When you execute:

p foo[123]

Because 123 does not exist, it access the default value.

If you added two values to the default value:

foo[123].push("hi")
foo[456].push("hello")

your output would be:

p foo       # => {}
p foo[123]  # => ["hi","hello"]
p foo       # => {}

Here, poo[123] does again still not exist, so it prints out the contents of the default value.

Paul Bentley
  • 344
  • 1
  • 9
  • ok... I think I get it. It is one instance of an empty array. It won't be a `dup` of an empty array each time... so `foo[123]` and `foo[345]` all refer to this very same array instance... so it is a bit weird... a dictionary that has its key and value pairs, and any non-key maps to this one single array instance... – nonopolarity Apr 30 '17 at 08:51
  • it can be error prone too... if some code is `foo[key] = foo[key] || []`... and then if someone else accidentally set `foo[:wah]`... then my first line of code is "messed up". Then I may as well just use `foo = Hash.new()` and not worry about this dangling array instance. – nonopolarity Apr 30 '17 at 08:53
  • `.push()` is not a `Hash` method. It is part of the `Array` class. You get an `Array` as your default value because you specify it in `foo = Hash.new([])`. – Paul Bentley Apr 30 '17 at 08:55
  • I think the main point is that it is one instance of Array, not multiple... oh well... or use `Hash.new {|hsh, key| hsh[key] = [] }` if you want to expected behavior as you hoped for... it is a bit verbose – nonopolarity Apr 30 '17 at 08:57
  • 1
    Take a look at my answer, I provided an alternative way to update keys. :) – sudee Apr 30 '17 at 08:59