5

I've been with Ruby for about a year now and have a language question: are symbols necessary because Ruby strings are mutable and not interned?

In, say, Java, strings are immutable and interned. So "foo" is always equal to "foo" in value and reference and its value cannot change. In Ruby, strings are mutable and not interned, so "a".object_id == "a".object_id will be false.

If Ruby had implemented strings like Java, symbols wouldn't be necessary, right?

the Tin Man
  • 158,662
  • 42
  • 215
  • 303
Steve Potter
  • 1,899
  • 2
  • 22
  • 25
  • 1
    Sidenote: As of 2.3 you can supply a flag for immutable String Literals `RUBYOPT=--enable-frozen-string-literals` which will make all literal strings (e.g. "This") frozen and immutable. This changes is currently planned to be default for Ruby 3.0 but it does not dispose of symbols. Symbols have their own place in ruby beyond "string" functionality, take `Symbol#to_proc` for example, which is extremely popular syntax how would one deal with this as a string? – engineersmnky Jun 19 '17 at 15:52
  • This comment by @engineersmnky is the best answer here so far and I believe it should be converted into an answer. `Symbol#to_proc` is worth pretty much everything in Ruby. – Aleksei Matiushkin Jun 19 '17 at 16:16
  • @mudasobwa adapted to an answer with a bit more context – engineersmnky Jun 19 '17 at 16:49

4 Answers4

7

As of Ruby 2.3 immutable Strings have been implemented optionally via the RUBYOPT flag --enable-frozen-string-literals i.e.

RUBYOPT=--enable-frozen-string-literals ruby /some/file

This will cause all String literals (strings created using "", %q{}, %Q{}, or "#{}" styles) to become immutable. This feature is currently being considered as default for Ruby 3.0. Follow along with Feature#11473. This feature is also available on a file level rather than a global level as a "magic comment"

# frozen_string_literal: true

This will have the same impact as the RUBYOPT version but will apply only to the specific file. (one other way is to interact with the VM directly RubyVM::InstructionSequence.compile_option = {frozen_string_literal: true})

Since this is optional obviously it can be turned on and off and will still be an option in 3.0 just defaulting to on instead of off. Mutable Strings can still be created using String.new and Immutable Strings can be duped to make their dup counter part mutable. (Please Note above: interpolation "#{}" creates a new Immutable string as well because of "")

All that being said it does not replace the need for Symbols in ruby. First of all the underlying C that powers ruby leverages Symbols heavily via rb_itern to handle references for things like method definitions (These have been titled "Immortal Symbols" and will never be GCed).

Additionally Symbols like all things in ruby are their own Object and have their own useful sets of functionality. Take Symbol#to_proc for example. This originated as a monkey patch solution for syntactical ease and was consumed into core in 1.8.7. This style is highly encouraged and regularly leveraged by the ruby community as a whole. Please advise how you would suggest having degradation of this feature work with a String instead of a Symbol.

While Symbols used to be considered somewhat "dangerous" (for lack of a better word) due to their internment and memory consumption in combination with the dynamics of ruby. As of Ruby 2.2 most Symbols (see above) can be garbage collected i.e. symbols created inside of ruby through String internment (#intern, #to_sym, etc.). (These have been coined "Mortal Symbols")

Minor caveats include things like

 define_method(param[:meth].to_sym) {}

This seems like since it is calling to_sym that it should be a "Mortal Symbol" but since define_method calls rb_intern to keep the method reference it actually will create an "Immortal Symbol"

Hopefully this run down helps explain the necessity of Symbol in ruby not only from a developer standpoint but also the heavy usage as part of the C internals of ruby's implementation.

engineersmnky
  • 25,495
  • 2
  • 36
  • 52
2

Pretty much, yes. My understanding is that the compiler keeps a table of symbols, which can grow dynamically. This is why you must never accept user-input and convert it unchecked into symbols, because you can create what's called a symbol overflow attack.

I believe the symbol overflow vulnerability was patched in Ruby 2.2.

see Getting warning : Denial of Service

Jason FB
  • 4,752
  • 3
  • 38
  • 69
  • i never knew about symbol overflow attacks! how interesting! – eiko Jun 19 '17 at 15:33
  • 3
    With a modern Ruby, this is not as much a problem as is used to be (that is, creating a symbol is now as expensive as any other object). Earlier versions of Ruby have kept existing Symbols in memory indefinitely. Since Ruby 2.2 however, symbols will also be garbage collected, similar to (almost) any other object. – Holger Just Jun 19 '17 at 15:46
  • 1
    @eiko -- run Breakman (https://github.com/presidentbeef/brakeman) on your Rails app — you'll learn a bunch of things! As Holger Just said this applies only to Ruby version < 2.2 – Jason FB Jun 19 '17 at 20:50
2

Java-like strings would replace the functionality of symbols, so in that sense you are correct. However, I don't think Matz would be happy with a language which only has immutable and interned strings.

With strings and symbols, Ruby offers the best of both worlds. Symbols provide memory-efficiency for read-only strings like hash keys, whereas mutable strings are memory-efficient for string operations like concatenation.

So maybe "if Ruby had implemented strings like Java" isn't the right train of thought. Ruby did implement strings like Java. And they're called "symbols." Then Ruby implemented a second type of string, which it calls a "string." The naming is purely aesthetic, but I think it makes sense.

the Tin Man
  • 158,662
  • 42
  • 215
  • 303
eiko
  • 5,110
  • 6
  • 17
  • 35
  • There is also a middle ground: frozen Strings. With a modern Ruby, frozen string literals are even handled almost the same as Symbols (e.g. they also get interned, similar to as symbols) and share the same performance advantages. Because of that, there is a growing number of people who advocate getting rid of the concept of Symbols in Ruby altogether in favour of frozen strings. You are right however in that the distinction between (generally mutable) Strings and immutable interned Symbols used to be much more prominent. – Holger Just Jun 19 '17 at 15:50
  • @HolgerJust “there is a growing number of people who advocate getting rid of the concept of Symbols in Ruby altogether in favour of frozen strings”—what?! That is a nonsense, since getting rid of `Symbol`s will ruin approximately 100% of existing code (that in turn everywhere relies on `Symbol#to_proc` behaviour in code like `[1,2,3].reduce(:+)`.) – Aleksei Matiushkin Jun 19 '17 at 16:31
  • Well, it's still a discussion and is quite controversial at that... One proposal would be to make symbols in code just syntactic sugar for frozen strings. And realistically, most of the advantages of Symbols are already implemented for frozen strings already, including usage in Hash keys, literal creation, ... As for `Symbol#to_proc`: nothing prevents anyone from implementing `String#to_proc` the same way as it's done for Symbols today. – Holger Just Jun 19 '17 at 16:36
  • @mudasobwa And it's even just one line of code :) `String.class_eval { define_method("to_proc") { s = self; proc{ |o, arg| o.public_send(s, arg) } } }` allows you to run `[1,2,3].reduce(&"+")` and get the same result as `[1,2,3].reduce(:+)`. – Holger Just Jun 19 '17 at 16:48
  • @HolgerJust that is just a simple example of useful `Symbol` syntax sugar. How about you take a look at `rb_intern()` in the `C` code and how that might impact the loss of `Symbol` internment – engineersmnky Jun 19 '17 at 16:50
  • What I originally wanted to say was: the differences between Symbols and frozen Strings are getting smaller on every new version of Ruby, including its internment handling. There is (or was) a discussion about how Symbols and Strings can be unified but it's not something that (to my knowledge) is actively pursued as an actual goal in Ruby core development right now. There are still major areas in Ruby where the unification is not (yet) useful or possible but we are getting nearer to that constantly. – Holger Just Jun 19 '17 at 16:57
2

I've been with Ruby for about a year now and have a language question: are symbols necessary because Ruby strings are mutable and not interned?

No.

Symbol and String are simply two different data types. String is for text, Symbol is for labels.

In, say, Java, strings are immutable and interned.

No, they are not. They are immutable and sometimes interned, sometimes not. If Strings were interned, then why is there a method java.lang.String.intern() which interns a String? Strings in Java are only interned if

  • you call java.lang.String.intern() or
  • the String is the result of a String literal expression or
  • the String is the result of String-typed constant value expression

Otherwise, they are not.

So "foo" is always equal to "foo" in value and reference and its value cannot change.

Again, this is not true:

class Test {
  public static void main(String... args) {
    System.out.println("foo".equals(args[0]));
    System.out.println("foo" == args[0]);
  }
}

Call it with

java Test foo
# true
# false

In Ruby, strings are mutable and not interned, so "a".object_id == "a".object_id will be false.

In modern Ruby, that is not necessarily true either:

#frozen_string_literal: true
"a".object_id == "a".object_id
#=> true

If Ruby had implemented strings like Java, symbols wouldn't be necessary, right?

No. Like I said, they are different types for different use cases.

Take a look at Scala, for example, which implements "strings like Java" (in fact, on the JVM implementation of Scala, there is no String, Scala String simply is java.lang.String). Yet, it also has a Symbol class.

Likewise, Clojure has not one but two datatypes like Ruby's Symbol: keywords are exactly equivalent to Ruby's Symbols, they evaluate to themselves and stand only for themselves. Symbols OTOH may stand for something else.

Erlang has immutable strings and atoms, which are like Clojure/Lisp symbols.

ECMAScript has immutable strings and recently added a Symbol datatype. They are not 100% equivalent to Ruby Symbols, though, since they have an additional guarantee: not only do they evaluate only to themselves and stand only for themselves, but they are also unforgeable (meaning it is impossible to create a Symbol which is equal to another Symbol).

Note that Ruby is moving away from mutable strings:

  • Ruby 2.1 optimizes the pattern 'literal string'.freeze to return a frozen string from a global string pool.
  • Ruby 2.3 introduces the # frozen_string_literal: true pragma and --enable=frozen-string-literal feature toggle switch to make all string literals frozen (and pooled) by default on a per-script (pragma) or per-process (feature toggle) basis.
  • Ruby 3 will switch the default for both of those to true, so that you have to explicitly say # frozen_string_literal: false or --disable=frozen-string-literal in order to get the current behavior.
  • Some later version will remove support for mutable strings altogether.
Jörg W Mittag
  • 363,080
  • 75
  • 446
  • 653