25

I'm in the process of learning Clojure and I can't understand some language design decisions: Why does a language with immutable Strings like Clojure also needs Keywords and Symbols data types? Couldn't strings just have optional namespaces and metadata and all this stuff? For immutable strings comparison could just as well be identity base, no?

Or, since interop with Java is a must have for Clojure, at least have the Java String type and a KeywordSymbol data type.

I find this String/Keyword/Symbol "trichotomy" especially weird since Clojure seems very focused on "purity" and keeping things simple in other aspects.

NeuronQ
  • 7,527
  • 9
  • 42
  • 60
  • See also: [Why does Clojure have “keywords” in addition to “symbols”?](http://stackoverflow.com/q/1527548/405550) – Zaz Jul 29 '16 at 08:09
  • 2
    No, immutability doesn't mean identity comparison is enough, you can still have multiple immutable instances with the same content. That assumption only holds for interned strings. – David Ongaro Jan 23 '18 at 14:07

2 Answers2

16

They fill very different roles within the language:

  • Vars are used to give names to things. They implement runnable and can be used directly to invoke functions. You cannot run a string.
  • Keywords are names by themselves, and look themselves up in maps. They really help Clojure keep its "data driven" flavor. Strings do not implement the required interfaces to look themselves up in maps.
  • Strings are just strings. They do what they need to do and not much more.

One of the core principles in the design of Clojure was to embrace your host platform, thus in Clojure strings are Java strings and you never need to wrap a Java string in some convert-to-clojure-string function in order to get it into the Clojure ecosystem. This necessitated using unmodified Java strings, as well as the numeric types. Keywords and symbols are new constructs that are being added by Clojure, so it is only necessary to make them accessible in a useful way from the rest of the Java ecosystem. Symbols and Keywords make themselves accessible by simply being classes that implement an interface. It was believed in the beginning that in order for a new language to succeed in the JVM ecosystem, it needed to fully embrace Java and minimise the "impedance mismatch" (sorry for the buzzwordism) even if that required adding more to the language than would have been required without this goal.

edit:


You can sort of turn a symbol into a keyword by defing it to it's self

user> a
; Evaluation aborted.
user> :a
:a
user> (def a 'a)
#'user/a
user> a
a
user> 

keywords evaluate to themselves

Arthur Ulfeldt
  • 90,827
  • 27
  • 201
  • 284
  • This doesn't help with why keywords and strings have to be separate. In Ruby, for example, strings and symbols are both necessary because one is mutable and the other is not. Why couldn't you just have keywords or strings, and that one has the interface for looking themselves up? – Chris Jul 25 '12 at 17:58
  • I really like the fact that keywords look themselves in maps and these types of behavior, but you could just as well have strings capable of the same feats. And about implementing runnable, maybe I just don't have a deep enough understanding of homoiconicity and "code that writes code" stuff, because I just see that things like: (1) `(def x +) (x 2 3)` and (2) `(def x '+) (x 2 3)` do the same thing anyway (where "+" stands for any function name) – NeuronQ Jul 25 '12 at 18:29
  • i'll add more about the design decision behind this – Arthur Ulfeldt Jul 25 '12 at 18:36
  • maybe i'm missing something, but i don't see what the case is for making keywords distinct from quoted symbols. couldn't you replace `:` with `'` and get rid of keywords? am i misunderstanding quoting? – andrew cooke Jul 25 '12 at 22:07
  • I'll append the difference to my answer – Arthur Ulfeldt Jul 25 '12 at 22:28
  • oh that's more complicated than i was thinking. i just meant instead of `{:a 1 :b 2}` use `{'a 1 'b 2}`. for example `({'a 1 'b 2} 'a)` returns `1`. so the character ":" has the same end result as the character "'" (although more widely they differ). but it's true that a quoted symbol doesn't evaluate to anything (is that important / useful)? so even with your comment i don't understand why clojure has keywords (rather than just doing what i did above) (but if there was a use for evaluating to yourself i guess that would explain it). – andrew cooke Jul 25 '12 at 23:51
  • @andrewcooke you say a symbol doesn't evaluate to nothing? If I type `'x` in the REPL I get `x` so my conclusion would be something like "a symbol evaluates to itself, i.e. a symbol, but the symbol does not get resolved" ...my head is spinning now... – NeuronQ Jul 26 '12 at 07:21
  • @ArthurUlfeldt ...wouldn't you actually have to `(keyword (name a))` or `(keyword (name 'a))` (which evaluate to the same thing, as expected) to actually turn a symbol into a keyword? And I think andrew coke meant `('a {'a 1 'b 2})` which indeed works, evaluating to 1, so *symbols can also look themselves up in maps* – NeuronQ Jul 26 '12 at 07:28
  • 5
    I think it's focusing on clarity of purpose. Keywords are designed to be data look ups, symbols are designed for compile time look ups. You have a guarantee that there is only one keyword with a given name, but that guarantee doesn't exist for maps. I think other reasons why Strings weren't just extended is due to keywords capable of being namespaced and due to the custom interning behavior keywords use. – deterb Jul 26 '12 at 15:16
  • @ArthurUlfeldt I chose your answer, though I'm still in the fog a bit... maybe once I get a better grip of Clojure and have some time to peek into its internals some things will get clearer... If I ever find a more satisfactory answer for this I'll post an "answer your own question" to share the insight. I'll take the **"they fulfill different roles and Strings absolutely have to be compatible with Java Strings"** take home message from this. Thanks! – NeuronQ Jul 28 '12 at 05:51
4

I think Clojure values "practicality" (if that's the correct word) somewhat more than "purity". This can be seen in the fact, Clojure has syntax for maps, vectors and sets in addition to lists, and is using it to define the language. In Scheme, which is much more concerned with purity (IMO), you only have syntax for lists.

As Arthur Ulfeldt points out strings, keywords and symbols have their intended use cases. And using them as intended makes it easier to read Clojure code. It's similar to what is happening with HTML 5, which adds semantic mark-up. Things like <article> and <section>, which you can represent with <div class="article"> and <div class="section"> in HTML 4.

OH, and you're wrong about comparing strings just by identity. This is guaranteed to work only for interned strings. And you don't want to intern too many strings as they are stored into the so called permgen, which is quite limited in size and never garbage collected.

ivant
  • 3,909
  • 1
  • 25
  • 39
  • 1
    I think in some newer versions of the JVM, interned strings are eligible for garbage collection if all references to them are lost. I can't find a credible source for this right now, so I could be wrong. Either way, it's definitely true that you shouldn't routinely compare strings with `==`. – amalloy Jul 25 '12 at 20:04
  • Interesting, I didn't knew of JVM's interned strings (as you probably figured out I'm not a Java guy and I'm coming to Clojure from a completely different background, maybe this is way I find some Java/JVM-influenced language design decisions weird :) ) ...maybe I should've started playing with a Scheme before Clojure, to get a better "intuition" about Lisps... – NeuronQ Jul 26 '12 at 07:33