In OCaml,
let sub x y = x - y
is a shorthand notation (the syntactic sugar) for
let sub = fun x -> fun y -> x - y
To get it, let's pick an even simpler example,
let succ x = x + 1
is the shorter syntactic form for
let succ = fun x -> x + 1
which says that we bind the name succ
to a function fun x -> x + 1
, where the latter is the function literal. A literal in programming languages is the syntactic notation for defining values. For example, for integers we have a finite set of literals 1
, 2
, 3
, etc. For strings, we have a practically infinite set of literals, e.g., "hello"
, ""
, "etc"
. A function in OCaml is also a value, the same as an integer or a string, therefore it has a syntax for function literals, which is
fun <arg> -> <body>
It creates a function, that evaluates to the value of <body>
in which all free occurrences of <arg>
are substituted be the actually passed parameter.
As you probably noticed, we could really bind only one parameter in the function literal. Indeed, all functions in OCaml are unary. You may already know that you can define functions like fun x y -> x - y
, but this is also just syntactic sugar for fun x -> fun y -> x - y
. So what the latter notation means? Let's ascribe parenthesis to get the idea of how this function is evaluated:
fun x -> (fun y -> x - y)
So we can recognize the fun x -> <body>
part, where <body>
is a function on itself, i.e., <body> = fun y -> x - y
. That means that expression fun x -> fun y -> x - y
is a function, that takes an argument x
and returns another function, that takes an argument y
and returns x-y
. This way of defining functions, i.e., when instead of having functions that accept tuples you have a function that returns a function and so on, is called currying.
Now, when we are comfortable with the idea of currying and that all functions in OCaml are really of one argument, we can go back to the original example of the functional dictionaries.
The add function
let add (d : 'a dict) (k, v) : 'a dict = fun k' -> if k' = k then v else d k'
stripped of the (unnecessary) type annotations
let add d (k, v) = fun k' -> if k' = k then v else d k'
and then represented in the form that makes currying explicit, is
let add = fun d -> fun (k, v) -> fun k' -> if k' = k then v else d k'
So it is a function that for given dictionary d
produces a function that takes a pair (k,v)
and returns a function that for all k'
that are equal to k
will return d
. And if keys are different, then it applies the key to the first parameter d
. Let's take a deeper look at the
fun k' -> if k' = k then v else d k'
literal. As we can see variables d
and k
occur free in that function, i.e., they are not bound by the parameter k
. When a function references a variable from the outer scope (which is not defined on the module level), a closure is created. A closure is an opaque object that contains a pointer to the code1 (in our case it is if k' = k then v else d k'
and pointers to each captured variable (i.e., to each variable that is taken from the outer scope). In our case, it creates (roughly) a record that has three entries, d
, k
and the code. After a few hours of mediation, it is not hard to see, that a function dictionary is just singly-linked list of closures, as d
is a pointer to the closure, which, in turn, contains a pointer to another closure, until we reach the pointer to empty
, which returns None
no matter the key (and is coded incorrectly in your example, as it should expect any key, not for the value of type unit).
With all that said and kept in mind, we can now write the short-hand notation for the add
function:
let add d k k' = if k' = k then v else d k'
which has absolutely the same semantics (i.e., meaning, behavior) as any other notations, as it is just the syntactic sugar.
1)To prevent any confusion, the code itself is never stored in the heap memory. Even when you have an anonymous function that is not bound to any name, e.g., (fun x y -> x + y) 2 3
the compiler will emit the code and store it in the text section of a binary and give it some funny name. So the code above will be just a normal call to a function with mangled name, e.g., call Testing_fun001
. And going back to the functional dictionary example, each entry will be an object with three words, the key, the data, the pointer to the next entry. Basically the same underlying representation as an association list.