53

I'm going through Ruby Koans, and I hit #41 which I believe is this:

def test_default_value_is_the_same_object
  hash = Hash.new([])

  hash[:one] << "uno"
  hash[:two] << "dos"

  assert_equal ["uno","dos"], hash[:one]
  assert_equal ["uno","dos"], hash[:two]
  assert_equal ["uno","dos"], hash[:three]

  assert_equal true, hash[:one].object_id == hash[:two].object_id
end

It could not understand the behavior so I Googled it and found Strange ruby behavior when using Hash default value, e.g. Hash.new([]) that answered the question nicely.

So I understand how that works, my question is, why does a default value such as an integer that gets incremented not get changed during use? For example:

puts "Text please: "
text = gets.chomp

words = text.split(" ")
frequencies = Hash.new(0)
words.each { |word| frequencies[word] += 1 }

This will take user input and count the number of times each word is used, it works because the default value of 0 is always used.

I have a feeling it has to do with the << operator but I'd love an explanation.

Andrew Marshall
  • 95,083
  • 20
  • 220
  • 214
Jake Sellers
  • 2,350
  • 2
  • 21
  • 40
  • I believe I saw '<<' referred to as scoop, could be totally wrong. – Jake Sellers Apr 23 '13 at 01:38
  • 3
    You're mistaken, I don't think anybody has ever called it that. The only person to do so, according to google, is *you*. The first and only relevant result is this very question: https://www.google.ca/search?q=ruby+%22scoop+operator%22&aq=f&oq=ruby+%22scoop+operator%22&aqs=chrome.0.57j60j65l3j59.4906j0&sourceid=chrome&ie=UTF-8#q=ruby+%22scoop+operator%22&nfpr=1&sa=X&ei=LuZ1UeWKFcOiqgGn0oDIDw&ved=0CDEQvgUoAQ&bav=on.2,or.r_cp.r_qf.&bvm=bv.45512109,d.aWM&fp=4d9df6f102356d32&biw=1740&bih=1041 – user229044 Apr 23 '13 at 01:39
  • Perhaps a confusion with `::`, sometimes called the scope resolution operator. – user2398029 Apr 23 '13 at 01:44
  • 1
    No I just checked, one of the tuts I had gone over referred to it as "the shovel" and I mis-remembered. Proper name is simply concatenation operator I believe, prolly should have just gone with that. – Jake Sellers Apr 23 '13 at 01:46
  • It is not concatenation operator either. It is bitwise left-shift operator which is used also as append operator (for containers and streams). Concatenation operator is `+`. – Hauleth Apr 23 '13 at 01:53
  • I've written [a blog post](https://medium.com/klaxit-techblog/a-headache-in-ruby-hash-default-values-bf2706660392) on that some times ago, if one may be interested :) – Ulysse BN Jul 08 '21 at 09:00

3 Answers3

137

The other answers seem to indicate that the difference in behavior is due to Integers being immutable and Arrays being mutable. But that is misleading. The difference is not that the creator of Ruby decided to make one immutable and the other mutable. The difference is that you, the programmer decided to mutate one but not the other.

The question is not whether Arrays are mutable, the question is whether you mutate it.

You can get both the behaviors you see above, just by using Arrays. Observe:

One default Array with mutation

hsh = Hash.new([])

hsh[:one] << 'one'
hsh[:two] << 'two'

hsh[:nonexistent]
# => ['one', 'two']
# Because we mutated the default value, nonexistent keys return the changed value

hsh
# => {}
# But we never mutated the hash itself, therefore it is still empty!

One default Array without mutation

hsh = Hash.new([])

hsh[:one] += ['one']
hsh[:two] += ['two']
# This is syntactic sugar for hsh[:two] = hsh[:two] + ['two']

hsh[:nonexistant]
# => []
# We didn't mutate the default value, it is still an empty array

hsh
# => { :one => ['one'], :two => ['two'] }
# This time, we *did* mutate the hash.

A new, different Array every time with mutation

hsh = Hash.new { [] }
# This time, instead of a default *value*, we use a default *block*

hsh[:one] << 'one'
hsh[:two] << 'two'

hsh[:nonexistent]
# => []
# We *did* mutate the default value, but it was a fresh one every time.

hsh
# => {}
# But we never mutated the hash itself, therefore it is still empty!


hsh = Hash.new {|hsh, key| hsh[key] = [] }
# This time, instead of a default *value*, we use a default *block*
# And the block not only *returns* the default value, it also *assigns* it

hsh[:one] << 'one'
hsh[:two] << 'two'

hsh[:nonexistent]
# => []
# We *did* mutate the default value, but it was a fresh one every time.

hsh
# => { :one => ['one'], :two => ['two'], :nonexistent => [] }
Jörg W Mittag
  • 363,080
  • 75
  • 446
  • 653
  • For other people reading this answer, note that `:nonexistent` is just any name... it could have been `:foo` or `:bar` – nonopolarity Apr 30 '17 at 08:25
  • Note that when `Hash.new {|hsh, key| hsh[key] = [] }` is used, it is a new Array instance every time, versus, if it is `Hash.new([])`, it is the exact same Array instance every time when a key doesn't exist – nonopolarity Apr 30 '17 at 08:56
  • Y'all here for `hsh = Hash.new {|hsh, key| hsh[key] = [] }` – Epigene May 26 '22 at 09:48
4

It is because Array in Ruby is mutable object, so you can change it internal state, but Fixnum isn't mutable. So when you increment value using += internally it get that (assume that i is our reference to Fixnum object):

  1. get object referenced by i
  2. get it internal value (lets name it raw_tmp)
  3. create new object that internal value is raw_tmp + 1
  4. assign reference to created object to i

So as you can see, we created new object, and i reference now to something different than at the beginning.

In the other hand, when we use Array#<< it works that way:

  1. get object referenced by arr
  2. to it's internal state append given element

So as you can see it is much simpler, but it can cause some bugs. One of them you have in your question, another one is thread race when booth are trying simultaneously append 2 or more elements. Sometimes you can end with only some of them and with thrashes in memory, when you use += on arrays too, you will get rid of both of these problems (or at least minimise impact).

vgoff
  • 10,980
  • 3
  • 38
  • 56
Hauleth
  • 22,873
  • 4
  • 61
  • 112
1

From the doc, setting a default value has the following behaviour:

Returns the default value, the value that would be returned by hsh if key did not exist in hsh. See also Hash::new and Hash#default=.

Therefore, every time frequencies[word] is not set, the value for that individual key is set to 0.

The reason for the discrepancy between the two code blocks is that arrays are mutable in Ruby, while integers are not.

user2398029
  • 6,699
  • 8
  • 48
  • 80
  • Yes I'm interested in the apparent discrepancy between the behavior in the two code blocks. In the first, the default value is modified by use and in the second it seems immutable. – Jake Sellers Apr 23 '13 at 01:33