6

I'd like to do the following:

for every nested function f anywhere in this_py_file:
    if has_free_variables(f):
        print warning

Why? Primarily as insurance against the late-binding closure gotcha as described elsewhere. Namely:

>>> def outer():
...     rr = []
...     for i in range(3):
...         def inner():
...             print i
...         rr.append(inner)
...     return rr
... 
>>> for f in outer(): f()
... 
2
2
2
>>> 

And whenever I get warned about a free variable, I would either add an explicit exception (in the rare case that I would want this behaviour) or fix it like so:

...         def inner(i=i):

Then the behaviour becomes more like nested classes in Java (where any variable to be used in an inner class has to be final).

(As far as I know, besides solving the late-binding issue, this will also promote better use of memory, because if a function "closes over" some variables in an outer scope, then the outer scope cannot be garbage collected for as long as the function is around. Right?)

I can't find any way to get hold of functions nested in other functions. Currently, the best way I can think of is to instrument a parser, which seems like a lot of work.

Evgeni Sergeev
  • 22,495
  • 17
  • 107
  • 124
  • What are you trying to do exactly? – kpie May 03 '16 at 02:53
  • 1
    This is an [XY problem](http://meta.stackexchange.com/questions/66377/what-is-the-xy-problem) on a massive scale. If you're worried about late-binding closures in a piece of code, _rewrite the code to not suffer from it_, don't do a massive try-except style thing after the fact. Do you not have control over the code that could potentially be doing this? – Akshat Mahajan May 03 '16 at 03:00
  • Also, people have asked about [late-binding closures before](http://stackoverflow.com/questions/6035848/python-closure-not-working-as-expected) and have received answers that help mitigate the same. Following suit would serve you better than your current approach. – Akshat Mahajan May 03 '16 at 03:04
  • "if a function "closes over" some variables in an outer scope, then the outer scope cannot be garbage collected for as long as the function is around" - no, just the variables the function needs, and removing the closure will just mean you have to keep those references some other way. – user2357112 May 03 '16 at 03:10
  • 1
    @AkshatMahajan Have you come across HTML validators or [lint checkers](http://stackoverflow.com/q/8503559/1143274)? This is exactly how I'm planning to do this. Otherwise, think of asserts that are turned off before code is released or the debug version of STL. This isn't an XY problem unless you can suggest a better alternative (while I do appreciate the "rewrite all your code" joke :) – Evgeni Sergeev May 03 '16 at 03:13
  • @EvgeniSergeev So you're writing a lint checker? Okay, then this makes sense. Thanks for clarifying. I do think a parser might serve you better, though - the grammar specification for Python is [online](https://docs.python.org/2/reference/grammar.html), after all. Or you could just use [`ast`](https://docs.python.org/3/library/ast.html) to parse existing Python code. – Akshat Mahajan May 03 '16 at 03:21
  • @AkshatMahajan Your second comment misses the point. Yes, I am already following the suggested approach, as you can see from my question — it's hard to make that any more clear or concise. What I need is a mechanism that reminds me that I'm following that suggested approach *everywhere*; that I haven't forgotten to follow it. Does that really need to be pointed out explicitly? – Evgeni Sergeev May 03 '16 at 03:26
  • @EvgeniSergeev I apologise, I must be missing something. You also mention thinking about "instrumenting a parser" at the end - what is _that_ in reference to, if you've already done that? It's because of that statement that led me to think you were trying to write something to debug your own code using in-built functions in the first place. Sorry, I just think I've gotten the wrong end of the stick here otherwise. – Akshat Mahajan May 03 '16 at 03:35

4 Answers4

2

Consider the following function:

def outer_func():
    outer_var = 1

    def inner_func():
        inner_var = outer_var
        return inner_var

    outer_var += 1
    return inner_func

The __code__ object can be used to recover the code object of the inner function:

outer_code = outer_func.__code__
inner_code = outer_code.co_consts[2]

From this code object, the free variables can be recovered:

inner_code.co_freevars # ('outer_var',)

You can check whether or not an code object should be inspected with:

hasattr(inner_code, 'co_freevars') # True

After you get all the functions from your file, this might look something like:

for func in function_list:
    for code in outer_func.__code__.co_consts[1:-1]:
        if hasattr(code, 'co_freevars'):
            assert len(code.co_freevars) == 0

Someone who knows more about the inner workings can probably provide a better explanation or a more concise solution.

Jared Goguen
  • 8,772
  • 2
  • 18
  • 36
0

To "get a hold" of your nested functions (even though you are overriding them) you would have to use eval to make variable definition names on each declaration.

def outer():
     rr = []
     for i in range(3):
         eval("def inner"+str(i)+"""():
             print """+str(i))
         rr.append(eval("inner"+str(i)))
     return rr

for f in outer(): f()

prints

1
2
3
kpie
  • 9,588
  • 5
  • 28
  • 50
  • That forces the closure to be evaluated at assignment time, which is good, but not always the right solution. Imagine if you wanted to code a stateful function that's only meant to make, say, database calls later on in the code - this would force a database call right away, rather than later on as planned. This is a hack, not a complete solution. Good try, though :D – Akshat Mahajan May 03 '16 at 03:01
  • Interesting. my code doesn't work. and here I was so confident... The usual dilemma. – kpie May 03 '16 at 03:03
  • 2
    I didn't mention it, but it should be clear that a real solution has to be a "set and forget" style feature. It will become part of the development harness and there is absolutely no way that it should influence coding style or change existing code. Even adding a decorator to every nested function would be overkill. – Evgeni Sergeev May 03 '16 at 03:17
0

I also wanted to do this in Jython. But the way shown in the accepted answer doesn't work there, because the co_consts isn't available on a code object. (Also, there doesn't seem to be any other way to query a code object to get at the code objects of nested functions.)

But of course, the code objects are there somewhere, we have the source and full access, so it's only a matter of finding an easy way within a reasonable amount of time. So here's one way that works. (Hold on to your seats.)

Suppose we have code like this in module mod:

def outer():
    def inner():
        print "Inner"

First get the code object of the outer function directly:

code = mod.outer.__code__

In Jython, this is an instance of PyTableCode, and, by reading the source, we find that the actual functions are implemented in a Java-class made out of your given script, which is referenced by the code object's funcs field. (All these classes made out of scripts are subclasses of PyFunctionTable, hence that's the declared type of funcs.) This isn't visible from within Jython, as a result of magic machinery which is a designer's way of saying that you're accessing these things at your own risk.

So we need to dive into the Java for a moment. A class like this does the trick:

import java.lang.reflect.Field;

public class Getter {
    public static Object getFuncs(Object o) 
    throws NoSuchFieldException, IllegalAccessException {
        Field f = o.getClass().getDeclaredField("funcs");
        f.setAccessible(true);
        return f.get(o);
    }
}

Back in Jython:

>>> import Getter
>>> funcs = Getter.getFuncs(mod.outer.__code__)
>>> funcs
mod$py@1bfa3a2

Now, this funcs object has all of the functions declared anywhere in the Jython script (those nested arbitrarily, within classes, etc.). Also, it has fields holding the corresponding code objects.

>>> fields = funcs.class.getDeclaredFields()

In my case, the code object corresponding to the nested function happens to be the last one:

>>> flast = fields[-1]
>>> flast
static final org.python.core.PyCode mod$py.inner$24

To get the code object of interest:

>>> flast.setAccessible(True)
>>> inner_code = flast.get(None)  #"None" because it's a static field.
>>> dir(inner_code)
co_argcount co_filename    co_flags co_freevars co_name co_nlocals co_varnames
co_cellvars co_firstlineno

And the rest is the same as the accepted answer, i.e. check co_freevars, (which is there in Jython, unlike co_consts).

A good thing about this approach is that you enumerate exactly all code objects that are declared anywhere within the source code file: functions, methods, generators, whether or not they are nested under anything or under each other. There is nowhere else for them to hide.

Evgeni Sergeev
  • 22,495
  • 17
  • 107
  • 124
-1

you need to import copy and use rr.append(copy.copy(inner))

https://pymotw.com/2/copy/

kpie
  • 9,588
  • 5
  • 28
  • 50
  • This does not answer the question. He isn't asking to avoid late-binding closures, he's asking to *detect* them. – Jared Goguen May 03 '16 at 03:19
  • Unfortunately this doesn't actually work. I think different `inner()` functions are created anyway (in some sense), but the trouble is that they all keep a free variable reference (using terminology in some loose sense) to the outer scope, and only look up the value of `i` in that scope when they are called. By which time it is `2`. This would appear as strange semantics from the point of view of C++/Java language model; this is more like what happens in JavaScript and maybe Ruby. – Evgeni Sergeev May 08 '16 at 12:10