174

A lot of times people use symbols as keys in a Ruby hash.

What's the advantage over using a string?

E.g.:

hash[:name]

vs.

hash['name']
Max
  • 21,123
  • 5
  • 49
  • 71

4 Answers4

244

TL;DR:

Using symbols not only saves time when doing comparisons, but also saves memory, because they are only stored once.

Ruby Symbols are immutable (can't be changed), which makes looking something up much easier

Short(ish) answer:

Using symbols not only saves time when doing comparisons, but also saves memory, because they are only stored once.

Symbols in Ruby are basically "immutable strings" .. that means that they can not be changed, and it implies that the same symbol when referenced many times throughout your source code, is always stored as the same entity, e.g. has the same object id.

  a = 'name'
  a.object_id
=> 557720

  b = 'name'
=> 557740

  'name'.object_id
=> 1373460

  'name'.object_id
=> 1373480          # !! different entity from the one above

# Ruby assumes any string can change at any point in time, 
# therefore treating it as a separate entity

# versus:

  :name.object_id
=> 71068

  :name.object_id
=> 71068

# the symbol :name is a references to the same unique entity

Strings on the other hand are mutable, they can be changed anytime. This implies that Ruby needs to store each string you mention throughout your source code in it's separate entity, e.g. if you have a string "name" multiple times mentioned in your source code, Ruby needs to store these all in separate String objects, because they might change later on (that's the nature of a Ruby string).

If you use a string as a Hash key, Ruby needs to evaluate the string and look at it's contents (and compute a hash function on that) and compare the result against the (hashed) values of the keys which are already stored in the Hash.

If you use a symbol as a Hash key, it's implicit that it's immutable, so Ruby can basically just do a comparison of the (hash function of the) object-id against the (hashed) object-ids of keys which are already stored in the Hash. (much faster)

Downside: Each symbol consumes a slot in the Ruby interpreter's symbol-table, which is never released. Symbols are never garbage-collected. So a corner-case is when you have a large number of symbols (e.g. auto-generated ones). In that case you should evaluate how this affects the size of your Ruby interpreter (e.g. Ruby can run out of memory and blow up if you generate too many symbols programmatically).

Notes:

If you do string comparisons, Ruby can compare symbols just by comparing their object ids, without having to evaluate them. That's much faster than comparing strings, which need to be evaluated.

If you access a hash, Ruby always applies a hash-function to compute a "hash-key" from whatever key you use. You can imagine something like an MD5-hash. And then Ruby compares those "hashed keys" against each other.

Every time you use a string in your code, a new instance is created - string creation is slower than referencing a symbol.

Starting with Ruby 2.1, when you use frozen strings, Ruby will use the same string object. This avoids having to create new copies of the same string, and they are stored in a space that is garbage collected.

Long answers:

https://web.archive.org/web/20180709094450/http://www.reactive.io/tips/2009/01/11/the-difference-between-ruby-symbols-and-strings

http://www.randomhacks.net.s3-website-us-east-1.amazonaws.com/2007/01/20/13-ways-of-looking-at-a-ruby-symbol/

https://www.rubyguides.com/2016/01/ruby-mutability/

Tilo
  • 33,354
  • 5
  • 79
  • 106
  • 6
    Fyi, Symbols will be GCd in the next version of Ruby: https://bugs.ruby-lang.org/issues/9634 – Ajedi32 Sep 30 '14 at 14:46
  • 2
    Also, Strings are automatically frozen when used as Hash keys in Ruby. So it's not exactly true that Strings are mutable when talking about them in this context. – Ajedi32 Sep 30 '14 at 14:50
  • 1
    Great insight on the topic & First link in "Long answer" section is removed or migrated. – Hbksagar Dec 23 '14 at 17:06
  • 5
    Symbols are garbage collected in Ruby 2.2 – Marc-André Lafortune Jan 20 '15 at 18:28
  • Good explanation, but I am wondering why the following warns duplicated key: `my_hash = {"testing": 1, :testing => 2}` – WaiKit Kung Jul 03 '16 at 10:52
  • @WaiKitKung it looks like you are doing this in Rails, not just Ruby / IRB. Rails uses the class `HashWithIndifferentAccess` and that's why you see this warning. If you just start an IRB shell, you won't see the warning because it uses the Hash class – Tilo Jul 05 '16 at 20:46
  • Mixing 'string evaluation' concept into the answer does not make any sense to me. The hashmap key is not an expression to evaluate it. It is a hash of a string vs a hash of an integer (object_id). The second is slightly faster. That is it. – golem Oct 02 '16 at 23:58
  • 2
    Great answer! On a trolling side, your "short answer" is also long enough. ;) – technophyle Jul 19 '18 at 20:35
  • 1
    added TL;DR :-P – Tilo Jul 19 '18 at 21:16
  • freezing of strings was introduced in Ruby 2.3 -- after this was written ;) https://www.ruby-lang.org/en/news/2015/12/25/ruby-2-3-0-released/ – Tilo Jun 22 '20 at 17:37
  • If you want to play around, you can start irb with frozen strings like this: `RUBYOPT=--enable-frozen-string-literal irb` and look at `object_id` of different strings / symbols. – Tilo Jul 08 '21 at 10:27
23

The reason is efficiency, with multiple gains over a String:

  1. Symbols are immutable, so the question "what happens if the key changes?" doesn't need to be asked.
  2. Strings are duplicated in your code and will typically take more space in memory.
  3. Hash lookups must compute the hash of the keys to compare them. This is O(n) for Strings and constant for Symbols.

Moreover, Ruby 1.9 introduced a simplified syntax just for hash with symbols keys (e.g. h.merge(foo: 42, bar: 6)), and Ruby 2.0 has keyword arguments that work only for symbol keys.

Notes:

1) You might be surprised to learn that Ruby treats String keys differently than any other type. Indeed:

s = "foo"
h = {}
h[s] = "bar"
s.upcase!
h.rehash   # must be called whenever a key changes!
h[s]   # => nil, not "bar"
h.keys
h.keys.first.upcase!  # => TypeError: can't modify frozen string

For string keys only, Ruby will use a frozen copy instead of the object itself.

2) The letters "b", "a", and "r" are stored only once for all occurrences of :bar in a program. Before Ruby 2.2, it was a bad idea to constantly create new Symbols that were never reused, as they would remain in the global Symbol lookup table forever. Ruby 2.2 will garbage collect them, so no worries.

3) Actually, computing the hash for a Symbol didn't take any time in Ruby 1.8.x, as the object ID was used directly:

:bar.object_id == :bar.hash # => true in Ruby 1.8.7

In Ruby 1.9.x, this has changed as hashes change from one session to another (including those of Symbols):

:bar.hash # => some number that will be different next time Ruby 1.9 is ran
Marc-André Lafortune
  • 78,216
  • 16
  • 166
  • 166
  • +1 for your excellent notes! I originally didn't mention the hash function in my answer, because I tried to make it easier to read :) – Tilo Nov 18 '11 at 22:04
  • @Tilo: indeed, that's why I wrote my answer :-) I just edited my answer to mention the special syntax in Ruby 1.9 and the promised named parameters of Ruby 2.0 – Marc-André Lafortune Nov 18 '11 at 23:00
  • Can you explain how Hash lookups are constant for Symbols and O(n) for Strings? – Asad Moosvi Jul 16 '17 at 15:52
7

Re: what's the advantage over using a string?

  • Styling: its the Ruby-way
  • (Very) slightly faster value look ups since hashing a symbol is equivalent to hashing an integer vs hashing a string.

  • Disadvantage: consumes a slot in the program's symbol table that is never released.

Larry K
  • 47,808
  • 15
  • 87
  • 140
0

I'd be very interested in a follow-up regarding frozen strings introduced in Ruby 2.x.

When you deal with numerous strings coming from a text input (I'm thinking of HTTP params or payload, through Rack, for example), it's way easier to use strings everywhere.

When you deal with dozens of them but they never change (if they're your business "vocabulary"), I like to think that freezing them can make a difference. I haven't done any benchmark yet, but I guess it would be close the symbols performance.

Kristaps Karlsons
  • 482
  • 1
  • 7
  • 22
jlecour
  • 2,905
  • 1
  • 25
  • 24