29

I understand the theoretical difference between Strings and Symbols. I understand that Symbols are meant to represent a concept or a name or an identifier or a label or a key, and Strings are a bag of characters. I understand that Strings are mutable and transient, where Symbols are immutable and permanent. I even like how Symbols look different from Strings in my text editor.

What bothers me is that practically speaking, Symbols are so similar to Strings that the fact that they're not implemented as Strings causes a lot of headaches. They don't even support duck-typing or implicit coercion, unlike the other famous "the same but different" couple, Float and Fixnum.

The biggest problem, of course, is that hashes coming into Ruby from other places, like JSON and HTTP CGI, use string keys, not symbol keys, so Ruby programs have to bend over backwards to either convert these up front or at lookup time. The mere existence of HashWithIndifferentAccess, and its rampant use in Rails and other frameworks, demonstrates that there's a problem here, an itch that needs to be scratched.

Can anyone tell me a practical reason why Symbols should not be frozen Strings? Other than "because that's how it's always been done" (historical) or "because symbols are not strings" (begging the question).

Consider the following astonishing behavior:

:apple == "apple"  #=> false, should be true

:apple.hash == "apple".hash #=> false, should be true

{apples: 10}["apples"]  #=> nil, should be 10

{"apples" => 10}[:apples]  #=> nil, should be 10

:apple.object_id == "apple".object_id #=> false, but that's actually fine

All it would take to make the next generation of Rubyists less confused is this:

class Symbol < String
  def initialize *args
    super
    self.freeze
  end

(and a lot of other library-level hacking, but still, not too complicated)

See also:

Update: I think Matz makes the case for class Symbol < String very well here: http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/ruby-core/9192 (thanks to Azolo for digging this up, and also Matz' eventual retraction).

AlexChaffee
  • 8,092
  • 2
  • 49
  • 55
  • excellent question! I remember being plagued by this question when i started learning ruby. However with the passage of time I just forgot about it – sohaibbbhatti Jun 18 '12 at 15:14
  • 1
    +1 for proper usage of "begging the question" (and a generally clear and well-expressed question) – Phrogz Jun 18 '12 at 15:33
  • 1
    `{apples: 10}["apples"]` should be 10? So `{1 => "foo"}[1.0]` should be `"foo"`, since you mentioned `Fixnum` and `Float` as example of classes where it's done "right"? – Michael Kohl Jun 18 '12 at 16:27
  • Also in your proposed solution, how do you then use a symbol as a hash key? `hash["key"]` won't work, since `"key"` is a string literal. Do you really want `hash[Symbol.new(key)]` or having to assign the key to a variable first? Symbols are implemented the way they are to allow for fast lookup (basically comparing an integer), which is exactly what you want for hash keys. – Michael Kohl Jun 18 '12 at 16:32
  • @Michael Kohl, you're quite right that I'm proposing that [] treat String and Symbol the same, but only because Symbol would be a subclass of String, returning the same hashcode and satisfying == via normal, existing semantics. The parallel with Float/Fixnum is not 100%. As for "how do you use a symbol as a hash key", I don't see the problem -- hash[:key] would still work fine. I don't want to change the language grammar at all, just slightly change the standard library. – AlexChaffee Jun 18 '12 at 17:05
  • 2
    I'm confused. In your first paragraph you lay out a quite detailed explanation of why symbols and strings are fundamentally different, completely dissimilar, absolutely nothing like each other. Then, in the second paragraph you claim that they are almost the same? – Jörg W Mittag Jun 18 '12 at 17:12
  • 1
    Jörg, they are similar in that if symbols did not exist, their function could be almost entirely duplicated by using (frozen) strings. They are semantically distinct but functionally very similar, and it's surprising to most newcomers that they're not related. Just like JavaScript gets by without an integer type (all JS numbers are floating-point), Ruby could get by without a symbol type. For another example, a Stack is different from an Array, but Matz had no problem adding push and pop to Array; in the same way a string can behave like a symbol (and with no extra methods). – AlexChaffee Jun 18 '12 at 21:14
  • I think Matz makes the case for `class Symbol < String` very well here: http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/ruby-core/9192 – AlexChaffee Jun 19 '12 at 01:24
  • Also, check out this short screencast for the clear way Smalltalk does it. http://www.cincomsmalltalk.com/blog/blogView?entry=3422855520 "Symbols are Strings that are represented uniquely." Simple, no? – AlexChaffee Aug 26 '12 at 18:35
  • This question might be better answered if asked on the Ruby mailing list. – Ajedi32 Jun 06 '14 at 15:39

6 Answers6

9

This answer drastically different from my original answer, but I ran into a couple interesting threads on the Ruby mailing list. (Both good reads)

So, at one point in 2006, matz implemented the Symbol class as Symbol < String. Then the Symbol class was stripped down to remove any mutability. So a Symbol was in fact a immutable String.

However, it was reverted. The reason given was

Even though it is highly against DuckTyping, people tend to use case on classes, and Symbol < String often cause serious problems.

So the answer to your question is still: a Symbol is like a String, but it isn't.
The problem isn't that a Symbol shouldn't be String, but instead that it historically wasn't.

Community
  • 1
  • 1
Azolo
  • 4,353
  • 1
  • 23
  • 31
  • I found a couple mailing list threads about this. Changed my answer quite a bit. – Azolo Jun 18 '12 at 23:59
  • 1
    So the answer really is purely historical. I realized that people would need to revise their programs (e.g. changing `if foo.is_a? String` to `unless foo.is_a? Symbol` sometimes) but the subtleties of the `case` statement didn't occur to me. That would totally be solvable, though, just by ordering (always put the `when Symbol` above the `when String`), which is not nearly as hideous as some Ruby gotchas, and I think the case statement is generally more trouble than it's worth anyway... Ugh. – AlexChaffee Jun 19 '12 at 01:17
  • Well you could always lobby for it to be changed, but it means that you would have convince people help to go through the `stdlib` and change anything that may break. Also I don't know what the implications for `YARV` might be. But still a highly political task. – Azolo Jun 19 '12 at 01:41
  • This issue has come up on bugs.ruby-lang.org here: http://bugs.ruby-lang.org/issues/4801 – AlexChaffee Aug 26 '12 at 18:35
5

I don't know about a full answer, but here's a big part of it:

One of the reasons that symbols are used for hash keys is that every instance of a given symbol is exact same object. This means :apple.id will always return the same value, even though you're not passing it around. On the other hand, "apple".id will return a different id every time, since a new string object is created.

That difference is why symbols are recommended for hash keys. No object equivalency test needs to be done when symbols are used. It can be short-circuited directly to object identity.

Emily
  • 17,813
  • 3
  • 43
  • 47
  • You're right, and the interpreter could easily assure that every reference to :apple returns a pointer to the same instance, even if that instance is a subclass of String. – AlexChaffee Jun 18 '12 at 16:58
1

Another consideration is that "apple".each_char makes sense, but :apple.each_char doesn't. A string is an "ordered list of characters", but a symbol is a atomic datapoint with no explicit value.

I'd say that HashWithIndifferentAccess actually demonstrates that Ruby symbols are fulfilling two different roles; symbols (which are essentially like enums in other languages) and interned strings (which are essentially a preemptive optimisation, compensating for the fact that ruby is interpreted so doesn't have the benefits of an intelligent optimising compiler.)

Matty K
  • 3,781
  • 2
  • 22
  • 19
  • Which I guess means they _could_ be reimagined as `< String`, to remove the ambiguity. In which case the answer to "why not" is as Azolo noted above. – Matty K Jun 19 '12 at 23:49
0

See this answer: https://stackoverflow.com/a/6745253/324978

Main reasons: performance (symbols are stored as integers, and are never garbage collected) and consistency (:admin and :admin will always point to the same object, where "admin" and "admin" don't have that guarantee), etc.

Community
  • 1
  • 1
robbrit
  • 17,560
  • 4
  • 48
  • 68
  • Performance could be optimized, and identity assured, during construction in the same way a lookup into the symbol table happens now when the parser encounters a symbol. I'm not saying symbols shouldn't be different from strings, I'm saying they should be *derived* from strings and therefore more useful. – AlexChaffee Jun 18 '12 at 16:57
0

The basics of it are, these should not be true:

:apple == "apple"  #=> false, should be true

:apple.hash == "apple".hash #=> false, should be true

Symbols are always the same objects, and text is not.

Carson Cole
  • 4,183
  • 6
  • 25
  • 35
  • That's begging the question. "apple" == "apple" is true even though they're different instances, so why can't :apple == "apple" ? – AlexChaffee Jun 18 '12 at 16:54
  • 1
    Why can't `"1" == 1` be true? It can. That's a choice that some languages have made, but it's not a choice that Ruby has made. I think to some extent this is the same situation. Matz decided that Symbols and Strings should be two different things, and so they are. – Emily Jun 18 '12 at 18:03
  • 1
    I am against making :apple == "apple" return true. A Symbol and a String is not the same or equivalent - even if there is an inheritance relation between the two, comparing two instances of different type for equivalence must never return true. – Robert Klemme Jun 20 '12 at 08:40
  • 1
    "comparing two instances of different type for equivalence must never return true" -- for counterexample, see `(1.0 == 1)` – AlexChaffee Jun 26 '12 at 16:34
  • Can it be it was just a poor design choice by Ruby ? There's really no justification for HashWithIndifferentAccess imo. – Joel Blum Nov 20 '19 at 14:04
0

If at all a String could inherit Symbol, because it adds a lot of functionality (mutating). But a Symbol can never be used "as a" String because in all circumstances where mutation would be needed it would fail.

In any case, as I said above, string == symbol must never return true as has been suggested above. If you think a bit about this you'll notice that there can be no reasonable implementation of == in a class which considers sub class instances as well.

Robert Klemme
  • 2,149
  • 18
  • 23
  • Thanks for the answer, but this is kind of begging the question. There is no a priori design-level reason why strings should not be == to their symbol equivalents, just like floats are == to their fixnum equivalents now. And as [Matz notes here](http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/ruby-core/9192), `Symbol < String` works for Smalltalk! – AlexChaffee Jun 26 '12 at 16:32