4

All Lisp developers seem to know what an S-Expression is. But can anybody explain this for non Lisp developers?

There is already a Wikipedia entry (https://en.wikipedia.org/wiki/S-expression). But that is not really helpful if you don't want to go deeply into detail.

What is an S-Expression? What can I express with an S-Expression? For what purpose uses Lisp normally S-Expressions? Are S-Expressions only relevant for Lisp developers?

habrewning
  • 735
  • 3
  • 12
  • Code in any language that amount to a value is an expression. Lisp code is just lists with elements, a fundmental datastructure in lisp, however the plan was to use a syntax (m-expressions) more similar to Java and Python, but the initial version just evaluated the code in data form and that was called s-expressions. s-expressions make a structured tree very similar to the tree sturcture a parser of other languages would produce and thus in lisp languages one could say the code and the AST is the same thing. – Sylwester Oct 27 '22 at 23:09

4 Answers4

6

The term S-expression refers to the printed form(s) of a Lisp object. For instance, the integer zero object can appear as a written S-expression like 0, 000, or #x0. The text (0 . 1) is the S-expression denoting a cons cell object whose fields are the integers zero and one. In Common Lisp, under the default read table, the tokens Foo, fOO, FOO, |FOO| and foo, are all S-expressions denoting the same symbol. They are different read syntax, equivalent through their semantics of denoting the same object.

Why don't we just call those things expressions? Firstly, there are times when we do, when it is clear from the context that we are speaking about character syntax. The term expression is ambiguous for that reason: it can sometimes refer to a textual, printed expression that, for instance, someone typed into a text file or interactive listener. Most of the time, expression refers to a Lisp object in memory representing code.

We could say printed expression instead of S-expression, but the term is historically entrenched, dating back to a time when Lisp also had M-expressions. Plus, printed expression would only have the same meaning as S-expression when we know we are already talking about nothing but Lisp. The term S-expression in a context outside of Lisp means something like "one of the printed object notations coming from the Lisp family, with symbols written without quotes, and nested lists with parentheses in which items are separated by only whitespace."

Note that the ANSI Common Lisp standard doesn't use the terms S-expression or symbolic expression. No such terms appear in the Glossary, only expression, which is defined like this:

expression n. 1. an object, often used to emphasize the use of the object to encode or represent information in a specialized format, such as program text. "The second expression in a let form is a list of bindings." 2. the textual notation used to notate an object in a source file. "The expression 'sample is equivalent to (quote sample)."

S-expression is more or less the (2) meaning, with historic ties and a broader interpretation outside of any one Lisp dialect. For instance, Ron Rivest, perhaps best known as one of the authors of the RSA cryptosystem. wrote an Internet Draft describing a form of S-expressions for data exchange.

Kaz
  • 55,781
  • 9
  • 100
  • 149
  • What can I express with an S-Expression? For what purpose uses Lisp normally S-Expressions? Are S-Expressions only relevant for Lisp developers? Why was there even an Internet Draft for something, that is only ever used in one programming language? – habrewning Feb 05 '23 at 08:53
  • @habrewning S-expressions can be used to communicate and store structured data, similarly to JSON and XML. Lisp isn't one programming language, but a family. Comon Lisp has its own S-expressions, so does Scheme and so on, and different dialects have their own extensions: e.g. Gauche Scheme and Chicken Scheme S-expressions are not entirely the same. For language- and platform-independent communication, you would certainly need some standard. Look at JSON, which is a separate specification from Javascript. – Kaz Feb 05 '23 at 23:40
5

An S-expression is the fundamental unit of storage in Lisp. By the original definition, an S-expression is one of two things.

  • An atom, or
  • a cons cell

An atom is the base case. In classical Lisp (the original language proposed by John McCarthy), an atom is just a distinct unit, that we conventionally designate with a name. Conceptually, you can think of it as a string, even though that's not how any modern Lisp would store it internally. So foobar is an atom, and so is potato. They're just strings that are atomic, in the sense that they don't recursively contain any more S-expressions.

Note that modern Lisp dialects extend the definition of "atom" to include things like numbers, so in Common Lisp, 1.0 would be a valid atom which represents a number.

A cons cell is the fundamental unit of composition in Lisp. A cons cell is a structure that points to two other S-expressions. We call the first of these S-expressions the car and the second the cdr. These names are archaic and were originally references to how cons cells were stored on old computers, but Lisp programmers today still use them. You'll hear some people refer to the car as the "first" or "head", and you'll hear some people refer to the cdr as the "tail" or "rest". (Try not to refer to the cdr as the "second" term, as that's ambiguous and could be interpreted as something else, which we'll talk about in a minute)

Now, we write cons cells in parentheses with a dot between them. So a cons cell where the car and cdr are both atoms would look like

(foo . bar)

This is a cons cell whose car is the atom foo and whose cdr is the atom bar. We can also nest cons cells.

((foo . bar) . (baz . potato))

And then we end up with a sort of binary-tree-like structure, where each branch has a left and a right (a car and a cdr, in our terminology), and each leaf is an atom.

So what can we do with this? Well, for one, we can store linked lists. There are several ways to do this, but the prevailing convention in the Lisp community is to use the car to store the current value and the cdr to store the cons cell pointing to the rest of the list. Then, when we reach the end of the list (where we might store a null pointer if we were doing this in C or Java), we pick out a particular atom, called NIL. There's nothing special about the NIL atom in the definition above; we just picked one out because we needed one to use as a convention.

So to represent the list [a, b, c, d], we would store it as

(a . (b . (c . (d . NIL))))

The car of the outermost cons cell is the first element of the list, or a. The cdr stores the rest of the list. The car of the cdr is the second element, b, and so on. (This is why I said not to call the cdr the "second" element, as "second" is often used to mean "car of cdr")

In fact, we do this so often that there's another notational convention in Lisp. If the cdr is another cons cell, then we simply drop the . and the parentheses and understand what it means. So, in general, we say that the following two are equivalent, for any S-expressions a, b, and c.

(a . (b . c)) === (a b . c)

Again, I haven't changed the definition. There's still only two valid S-expressions: atoms and cons cells. I've just invented a more compact way to write them.

Likewise, since we're going to be using NIL a lot to end lists, we simply drop it. If we have a NIL as the cdr of a cons cell, then by convention we remove the . and the NIL. So the following are equivalent for any S-expression a.

(a . NIL) === (a)

Again, I'm just inventing new, compact ways to write things, not changing the definitions.

Finally, as a notational convenience, we might sometimes write the atom NIL as a pair of empty parentheses, since it's supposed to look like the empty list.

NIL === ()

Now, looking at our list from earlier

(a . (b . (c . (d . NIL))))

we can use these rules to simplify it

(a . (b . (c . (d . NIL))))
(a b . (c . (d . NIL)))
(a b c . (d . NIL))
(a b c d . NIL)
(a b c d)

And now this looks remarkably like Lisp syntax. And that's the beauty of S-expressions. The Lisp code you're writing is just a bunch of S-expressions. For example, consider the following Lisp code

(mapcar (lambda (x) (+ x 1)) my-list)

This is ordinary Lisp code, the kind you would see in any everyday program. In Common Lisp, it adds one to each element of my-list. But the beauty is that it's just a big S-expression. If we remove all of the syntax sugar, we get

(mapcar . ((lambda . ((x . NIL) . ((+ . (x . (1 . NIL))) . NIL))) . (my-list . NIL)))

Not pretty, at least aesthetically, but now it's easier to see how this really is just a bunch of cons cells terminated in atoms. Your entire Lisp syntax tree is just that: a binary tree full of code. And you can manipulate it as such. You can write macros that take this tree, as a data structure, and do whatever on earth they want with it. The abstract syntax tree of your Lisp program isn't some opaque construct internal to the language; it's just a tree: an incredibly simple data structure that you already use in everyday programming anyway. The same lists and other structures that you use to store data in your Lisp program are also used to store code.

Modern Lisp dialects extend this with new conventions and, in some cases, new data types. Common Lisp, for instance, adds an array type, so #(1 2 3 4 5) is an array of five elements. It's not a linked list (since, in practice, linked lists are slow for random access), it's something else entirely. Likewise, Lisp dialects add new conventions on top of the NIL ones we've already discussed. In most Lisp dialects, the apostrophe, or single quote, is used to represent a call to the quote special form. That is,

'x === (quote x) (quote . (x . NIL))

For any S-expression x. Different dialects add different features to the original McCarthy definition, but the core concept is: What is the absolute minimum definition we need to be able to comfortably store the code and data of our Lisp program.

Silvio Mayolo
  • 62,821
  • 6
  • 74
  • 116
2

The other answers are very Lisp-specific, but actually S-expressions are useful outside of the Lisp world.

An S-expression is a (convenient) way of representing a tree whose leaf are symbols (names, strings, call them how you like). Each parenthesized part of an S-expression is a node, containing exactly the list of its children.

Example: (this (s expression) (could (be represented)) as (this tree))


       [..........]
       /|   | |  |
      / .   | as .
     / / \  |   / \
    /  s |  . this |
  this   |  |\    tree
         |  | \
 expression |  \
          could .
                |\
               be represented

In Lisp, the tree represented by S-expression corresponds to the Concrete Syntax Tree, which is why Lisp is so easy to parse.

However, since this representation of trees is convenient (it's relatively compact, it's very human-friendly and it's straightforward both to parse and to produce for a machine), it's also used in other contexts. For instance, Ocaml's Core library (which is an alternative standard library for that language) provides serialization and deserialization as S-expressions.

Besides this, Lisp also names some of its data structures S-expressions. This goes well with Lisp's homoiconicity, that is, the fact that the code can be manipulated pretty much like data.

So, to answer your questions:

  • S-expressions are both a syntactic way to represent trees and a tree data structure in Lisp.
  • With S-expressions you can express trees; the meaning you attach to the tree (its interpretation, if you will) is not specific to S-expressions. S-expression tell you how to write a tree, not what it means — and, in fact, people use them for different purposes, with different meanings.
  • Lisp uses S-expressions both to represent its own source code, to print values and as a data structure, recursively built from nil and cons (the exact details of all of this vary a lot between all the Lisp dialects).
  • S-expressions are not only relevant for Lisp developers, see for example the Ocaml serializing / deserializing library Sexp. In practice, other ways of representing data that have stronger typing are more commonly used where S-expressions could be used, such as JSON.
jthulhu
  • 7,223
  • 2
  • 16
  • 33
  • OCaml's definition of S-expressions is very narrow (either strings or list of expressions), which is maybe why you are saying that they are not equally useful to represent more strongly typed values (like JSON, where you have numbers and dictionaries), but to me this is irrelevant; first there are definitions of S-expressions that can embed various kinds of atoms and secondly, even when you have only strings you can still attach type metadata: `((map string single-float) (a 1e12) (b 0.5))` is a valid way to represent a hash-map from strings to float values. – coredump Oct 24 '22 at 11:42
  • @coredump this is not quite what I meant: in JSON, "6" is a string no matter how you interpret it, whereas 6 is a number; with S-expression, there is no way to distinguish between a number and a string without "interpreting" it. But my point was more: JSON *is* more used than S-expressions, and one reason *might* be this requirement of interpreting (at least to some extent) an S-expression to understand the "type" of the represented data. The fact that OCaml decides to interpret them as trees of strings was not my point. – jthulhu Oct 24 '22 at 16:05
  • OCaml was just an example of S-expressions being used somewhere else than in Lisp. – jthulhu Oct 24 '22 at 16:07
  • "with S-expression, there is no way to distinguish between a number and a string without "interpreting" it": it depends on what you call an S-expression; if your definition of S-expr is a tree of strings, then yes; if you support more types then no, you can have a dedicated `@2022-10-24T16:24:12.913732Z ` syntax that reads dates (something JSON does not have). So my point I think is that S-expr is used quite loosely to talk about many things; there is at least one attempt to formalize the format: http://people.csail.mit.edu/rivest/Sexp.txt which is a bit like the JSON RFC – coredump Oct 24 '22 at 16:35
  • I would think that for some time now in an s-expression `6` is a number and `"6"` is a string. – Rainer Joswig Oct 26 '22 at 06:54
2

s-expressions are short for Symbolic Expressions.

Basically they are Symbols and nested lists of Symbols.

A Symbol is made of alphanumeric characters.

Examples for symbols and nested lists of symbols:

foo
berlin
fruit
de32211
(apple peach)
(fruit (seller fruit-co))
((apple one) (peach two))

these lists were made of cons cells written as (one . two) and nil as the empty list.

Examples:

(a . (b . nil))  -> (a b)
((a . nil) (b . nil))   -> ((a) (b))

The programming language Lisp (short for List Processor) was designed to process these lists. Lisp contains all kinds of basic operations dealing with nested lists. There the elements of s-expressions can also be numbers, characters, strings, arrays and other data structures.

Symbolic Expressions have the same purpose as JSON and XML: they encode data.

Symbolic Expressions in Lisp also have the purpose to encode Lisp programs themselves.

Example:

((lambda (a b)
   (+ a (* 2 b)))
 10
 20)

Above is both an s-expression and a valid Common Lisp / Scheme program.

Symbolics Expressions were thought to be an universal notation for humans and machines to read/write/process all kinds of data in some calculus.

For example s-expressions could encode a mathematical formula, a Lisp program, a logic expression or data about the configuration of a planning problem. What was missing at the time was a way to describe declaratively valid data schema. s-expressions were typically processed and validated procedural.

How are s-expressions used in Lisp?

  • for encoding source code
  • for all kinds of data
  • mixed source code and data

Are S-Expressions only relevant for Lisp developers?

Mostly, but sometimes code or data exists in the form of s-expressions and programs written in other languages than Lisp want to process this data. Sometimes even developer not using Lisp are choosing s-expressions as a data representation format.

Generally the usage of s-expresssions outside of Lisp is rare. Still, there are a few examples. XML and JSON got much more popular than s-expressions.

Rainer Joswig
  • 136,269
  • 10
  • 221
  • 346