What is in a Python closure and what are the caveats for people used to OCaml?

Question

This is a sort of a follow-up on an old answer to a question about the necessity of functools.partial : while that answer very clearly explains the phenomenon and the basic reason for it, there are still some unclear points to me.

To recap, the following Python code

myfuns = [lambda arg: str(arg) + str(clo) for clo in range(4)]
try :
    clo
except NameError :
    print("there is no clo")
for arg in range(4) :
    print(myfuns[arg](arg), end=", ")

gives 03, 13, 23, 33, , while the similar OCaml code

let myfuns = Array.map (fun clo -> fun arg -> (string_of_int arg) ^ (string_of_int clo)) [|0;1;2;3|];;
(* there is obviously no clo variable here *)
for arg = 0 to 3 do
  print_string (myfuns.(arg) arg); print_string ", "
done;;

gives 00, 11, 22, 33, .

I understand this is related to a different notion of closure applied to lambda arg: str(arg) + str(clo) and its correspondent fun arg -> (string_of_int arg) ^ (string_of_int clo).

In OCaml, the closure maps the identifier clo to the value of the variable clo in the outer scope at the time of creation of the closure. In Python, the closure somehow contains the variable clo per se, which explains that it gets affected by the incrementation caused by the for generator.

Is this correct ?

How is this done ? The clo variable does not exist in the global scope, as evidenced by my try/except construct. Generally, I would assume that the variable of a generator is local to it and so does not survive it. So, again, where is clo ? This answer gives insight about __closure__ but I still do not completely grasp how it manages to refer to the clo variable per se during the generation.

Also, beside this strange behaviour (for people used to statically binding languages), are there other caveats one should be aware of ?

ivg · Accepted Answer · 2019-09-03T17:44:09.890

When Python creates a closure is collects all free variables into a tuple of cells. Since each cell is mutable, and Python passes a reference to the cell into the closure, you will see the last value of the induction variable in your loop. Let's look underneath the hood, here is our function, with i occurring free in our lambda expression,

def make_closures():
    return [lambda x: str(x) + str(i) for i in range(4)]

and here is the disassembly of this function

  2           0 BUILD_LIST               0
              3 LOAD_GLOBAL              0 (range)
              6 LOAD_CONST               1 (4)
              9 CALL_FUNCTION            1
             12 GET_ITER            
        >>   13 FOR_ITER                21 (to 37)
             16 STORE_DEREF              0 (i)
             19 LOAD_CLOSURE             0 (i)
             22 BUILD_TUPLE              1
             25 LOAD_CONST               2 (<code object <lambda>)
             28 MAKE_CLOSURE             0
             31 LIST_APPEND              2
             34 JUMP_ABSOLUTE           13
        >>   37 RETURN_VALUE

We can see that STORE_DEREF on 16 takes a normal integer value from the top of the stack (TOS) and stores it with the STORE_DEREF in a cell. The next three commands prepare the closure structure on the stack, and finally MAKE_CLOSURE packs everything into the closure, which is represented as a tuple (in our case 1-tuple) of cells,

 >>> fs = make_closures()
 >>> fs[0].__closure__
 (<cell at 0x7ff688624f30: int object at 0xf72128>,)

so it is a tuple with a cell containing an int,

 >>> fs[0].__closure__[0]
 <cell at 0x7ff688624f30: int object at 0xf72128>

 >>> type(fs[0].__closure__[0])
 cell

The crucial to the understanding point here, is that free variables are shared by all closures,

>>> fs[0].__closure__
(<cell at 0x7f1d63f08b40: int object at 0xf16128>,)

>>> fs[1].__closure__
(<cell at 0x7f1d63f08b40: int object at 0xf16128>,)

As each cell is a reference to a local variable in the enclosing function scope, indeed, we can find the i variable in the make_closures function, in the cellvars attribute,

>>> make_closures.func_code.co_cellvars
('i',)

Therefore, we have a little bit^? surprising effect of an integer value being passed by reference and becoming mutable. The main surprise in Python is the way how variables are packed and that the for loop is not having its own scope.

To be fair, you can achieve the same result in OCaml if you manually create a reference and capture it in a closure. e.g.,

let make_closures () =
  let arg = ref 0 in
  let fs = Array.init 4 (fun _ -> fun _ -> assert false) in
  for i = 0 to 3 do
    fs.(i) <- (fun x -> string_of_int x ^ string_of_int !arg);
    incr arg
  done;
  fs

so that

let fs = make_closures ()
fs.(1) 1;;
- : string = "14"

Historical References

Both OCaml and Python are influenced by Lisp and both imply the same technique for implementing closures. Surprisingly with different results, but not due to different interpretations of lexical scoping or closure environment but due to different object(data) models of the two languages.

The OCaml data model is not only simpler to understand but is also well defined by the rigorous type system. Python, due to its dynamic structure, leaves a lot of freedom in the interpretation of objects and their representation. Therefore, in Python, they decided to make variables bound in the lexical context of a closure mutable by default (even if they are integers). See also the PEP-227 for more context.

Do you have a reference about this part of the Python semantics that does not involve the C interface or the bytecode implementation and/or the reason/advantages of doing so ? Also, after thinking about it I find that a more correct translation of the OCaml code into Python would be ``myfuns = [(lambda x : lambda arg : str(arg)+str(x))(clo) for clo in range(4)]``. This is not an eta-conversion because ``x`` is free in ``lambda arg : str(arg)+str(x)``. — ysalmon, Aug 12 '19 at 13:45
Python is inherently defined by the implementation, so we can talk only about CPython. The rest could be different, but all other implementations are trying to follow CPython. I've added a couple of references, but it is hard to get more, since it was mostly implemented mostly before the Internet era. — ivg, Aug 13 '19 at 18:06
"Python is a pass-by-reference language" Python is a solely pass-by-value language, in the same way that Java is a solely pass-by-value language. Python closures, however, capture variables by reference. — newacct, Sep 02 '19 at 03:05
I don't want to be dragged into the pass-by-value vs pass-by-reference terminology holy war (especially, since Python is neither, nor this separation actually matters nowdays), so I just removed this ambiguous notion from the posting. — ivg, Sep 03 '19 at 17:46

score 1 · Answer 2 · answered Aug 05 '19 at 13:05

The difference is that python has variables and ocaml has bindings and currying.

Python:
myfuns = [lambda arg: str(arg) + str(clo) for clo in range(4)]

The for loop creates a variable clo and assigns the values 0, 1, 2, 3 to it for each iteration. The lambda binds the variable so it can later call str(clo). But since the loop last assigned 3 to clo all the lambdas append the same string.

Ocaml:
let myfuns = Array.map (fun clo -> fun arg -> (string_of_int arg) ^ (string_of_int clo)) [|0;1;2;3|];;

Here you call Array.map with the array [|0;1;2;3|]. This will evaluate the fun clo -> ... binding clo to each value in the array in turn. Each time the binding will be different so the string_of_int clo turns out different too.

While not the only difference this partial evaluation saves the day in python too. If you write your code like this:

Python:
def make_lambda(clo):
    return lambda arg: str(arg) + str(clo)
myfuns = [make_lambda(clo) for clo in range(4)]

The evaluation of make_lambda causes the clo in the lambda to be bound to the value of the make_lambda argument, not the variable in the for loop.

Another fix is binding the value in the lambda explicitly:

myfuns = [lambda arg, clo=clo: str(arg) + str(clo) for clo in range(4)]

score 1 · Answer 3 · answered Aug 14 '19 at 08:56

You already have a couple of excellent answers, but to focus on the essence, the difference is due to two design choices Python made:

All variable bindings are mutable, and captured as such in closures.
for comprehensions do not bind a different variable for every iteration, but reassign a new value to the same one.

Neither design choice is necessary, in particular not the latter. For example, in OCaml, the variable of a for-loop is not mutable, but a fresh binding for each iteration. Even more interesting, in JavaScript, for (let x of ...) ... will make x mutable (unless you use const instead), but it still is separate for every iteration. That fixes the behaviour of JavaScript's older for (var x in ...), which has the same problem as Python and is notorious for leading to subtle bugs with closures.

What is in a Python closure and what are the caveats for people used to OCaml?

3 Answers3

Historical References