0

If...

variable = Hash.new(0)

...will default to new values being the integer zero without having to specify the associated key, why do I have to use a block and specify the associated key for the new values to default to an array, like so...

variable = Hash.new { |h, k| h[k] = [] }

I read ruby-doc.org but can't seem to find an answer. Perhaps its "under the hood" and I can't see/comprehend it.

For context, the question came up when I couldn't reconcile why the first method didn't work and the second method did:

def find_duplicates1(array)
    indices = Hash.new([])
    array.each_with_index { |ele, i| indices[ele] << i }
    indices.select { |ele, indices| indices.length > 1 }
end

def find_duplicates2(array)
    indices = Hash.new { |h, k| h[k] = [] }
    array.each_with_index { |ele, i| indices[ele] << i }
    indices.select { |ele, indices| indices.length > 1 }
end
Corey Stewart
  • 175
  • 1
  • 12
  • 1
    See the docs for [`Hash` – Default Values](https://ruby-doc.org/core-3.0.1/Hash.html#class-Hash-label-Default+Values) – the last example shows why `Hash.new([])` doesn't work as expected. – Stefan May 12 '21 at 14:27
  • I saw that, but still didn't quite comprehend 'why' that was the case. It was further fuzzy to me because of the 'not advised' and 'recommended' language. – Corey Stewart May 12 '21 at 18:29

2 Answers2

3

Because indices = Hash.new([]) means that when calling it with an unknown key then the [] will be returned. But that empty default array will not be assigned to the former unknown key.

Here an example:

indices = Hash.new([])
indices[:foo] << :bar
indeces 
#=> {}

But even worse, because we added a value to the default hash that hash is now not empty anymore and will return the changed default value for all other unknown keys too:

indices[:baz]
#=> [:bar]

Whereas indices = Hash.new { |h, k| h[k] = [] } means that the block will run for all unknown keys and within the block, a new empty array is initialized and that new array is actually assigned to the former unknown key.

indices = Hash.new { |h, k| h[k] = [] }
indices[:foo] << :bar
indices 
#=> {:foo=>[:bar]}

indices[:bar] 
#=> []

Btw you might be interested in the Enumerable#tally method. By using it your method can be simplified to:

def find_duplicates(array)
  array.tally.select { |k, v| v > 1 }.keys
end
spickermann
  • 100,941
  • 9
  • 101
  • 131
1

It's because the default (whatever object it is) is used as the default. That object will be presented for EVERY undefined instance. They're all pointing to the same object.

For immutable objects (like the integer 0) it doesn't matter because if you replace 0 with 1 for a given key, then the key is pointing to a new object (the integer 1).

But if it's an array object and you "mutate" (change) it like array << "added" then that object... now with added "added", is the default for all future new keys and is likely the object that all existing keys are pointing to. All keys point to the single array object that looks like: ["added"]

By using a block, you are defaulting a NEW array object to the key. If you change the array object by adding an element, the other keys' objects are unchanged (they're different objects).

SteveTurczyn
  • 36,057
  • 6
  • 41
  • 53
  • 1
    _"will be assigned"_ could be misleading. For an undefined key, the default is returned but is not being _assigned_ to that key – the key will still be undefined afterwards. – Stefan May 12 '21 at 14:17
  • 1
    The second paragraph is exactly what I needed (weirdly, knowing WHY something DIDN'T work was more helpful than how things DO work). Thanks Steve! Reflecting back, I knew this separately, but needed to see it spelled out to connect the dots and complete the mental model in my head. – Corey Stewart May 12 '21 at 18:50