7

I have been playing with the inspect module from Python's standard library.

The following examples work just fine (assuming that inspect has been imported):

def foo(x, y):
    return x - y
print(inspect.getsource(foo))

... will print def foo(x, y):\n return x - y\n and ...

bar = lambda x, y: x / y
print(inspect.getsource(bar))

... will print bar = lambda x, y: x / y\n. So far so good. Things become a little odd in the following examples, however:

print(inspect.getsource(lambda x, y: x / y))

... will print print(inspect.getsource(lambda x, y: x / y)) and ...

baz = [2, 3, lambda x, y: x / y, 5]
print(inspect.getsource(baz[2]))

... will print baz = [2, 3, lambda x, y: x / y, 5].

The pattern seem to be that all relevant source code lines regardless of context are returned by getsource. Everything else on those lines, in my case stuff other than the desired function source / definition, is also included. Is there another, "alternative" approach, which would allow to extract something that represents a function's source code - and only its source code - preferably in some anonymous fashion?


EDIT (1)

def foo(x, y):
    return x - y
bar = [1, 2, foo, 4]
print(inspect.getsource(bar[2]))

... will print def foo(x, y):\n return x - y\n.

s-m-e
  • 3,433
  • 2
  • 34
  • 71
  • Maybe that can help: https://stackoverflow.com/a/21339223/1720199 – cglacet Mar 11 '19 at 17:35
  • @cglacet Thanks for the idea. I looked at [dill](https://github.com/uqfoundation/dill) and its source code and tested it. In the context of my question, it is a mildly improved wrapper around `inspect.getsource`, covering a few edge cases like functions defined in interactive environments. But it does not solve my fundamental problem - no difference here. – s-m-e Mar 11 '19 at 18:45
  • Do you "only" need to capture lambdas? If yes, then maybe it is possible to write a regex to extract lambdas from lines of code. But I guess that may be a bit complex to do. – cglacet Mar 11 '19 at 18:56
  • @cglacet I am looking for a generic solution working on any kind of function definition / function pointer / function reference. It appears that lambdas are the more complicated edge case ... function definitions with `def` are easier. If I have a reference to a "real" Python function inside a list for instance, similar to my example above with `baz[2]`, `inspect` will indeed deliver the source of the function instead of the definition of the list. – s-m-e Mar 11 '19 at 19:05
  • You example above doesn't give the line in which the list is defined, but the line in which item 2 appears – cglacet Mar 11 '19 at 19:08
  • @cglacet Exactly. See edit below my question. In this case, it is actually the desired behavior (as far as I am concerned ...) – s-m-e Mar 11 '19 at 19:11
  • @cglacet For full disclosure (and better examples / an actual use case), I am working on this [open source package](https://github.com/pleiszenburg/zugbruecke) and experimenting with better implementations of this [feature](https://zugbruecke.readthedocs.io/en/develop/memsync.html#key-f-custom-function-for-computing-the-length-of-the-memory-segment-optional). I want to enable my users to pass a function pointer instead of a string which can be parsed into a function. – s-m-e Mar 11 '19 at 19:13
  • I think the string may be a better idea. Allowing a function carries an implication that the function will be able to access helper functions and imports and closure variables and other things it relies on from the scope where the function was defined, which doesn't seem to be the case, going by my quick read of the zugbruecke documentation. – user2357112 Mar 11 '19 at 19:39

1 Answers1

8

Unfortunately, that's not possible with inspect, and it's unlikely to work without parsing (and compiling) the source code again. inspect's getsource method is rather limited: it uses getsourcelines to call then findsource, which essentially unwraps your object until we end up at a PyCodeObject.

At that point, we're dealing with compiled bytecode. All that's left from the original source are fragments and hints, such as co_firstlineno:

/* Bytecode object */
typedef struct {
    /* ... other fields omitted ... */
    int co_firstlineno;         /* first source line number */
    PyObject *co_code;          /* instruction opcodes */
    /* ... other fields omitted ... */
} PyCodeObject;

By the way, similar to the PyCodeObject, a PyFrameObject also contains only a f_lineno, but no column, which explains why tracebacks only include the file name as well as the line: the column isn't compiled into the bytecode.

As the bytecode does not contain any more specific regions than the (first) line, it's not possible to get the exact source location from inspect or any other library that only uses the (public) bytecode information without further parsing. This also holds true for any other option that only uses the bytecode, such as pickle.

inspect uses the public information (co_firstlineno) and then just searches for a suitable begin of a function and the end of the surrounding block. However, inspect is almost there, but it only finds any block, not the correct one, and it cannot find the correct one at the moment. inspect tokenizes the full line and does not start at the correct variant, it wouldn't know the correct corresponding source code region either.

Let's say we have

plus, minus, mult = lambda x: x + 1, lambda y: y - 1, lambda z: z * 5

and we want just minus. As the bytecode does not contain a co_firstcolumn, we only have the full line available. We could parse all lambdas, but we still don't know which lambda fits our co_code. We would need to compile them again and check whether their bytecode fits the original one.

In the end, we have to do exactly that: parse the source again and find the correct PyCodeObject. It would be a lot easier if we had at least a starting column number as we could just use a syntactical analysis, but the AST only preserves line numbers at the moment. So either inspect needs a big patch, or the bytecode needs to include the starting column of the compiled object.

Zeta
  • 103,620
  • 13
  • 194
  • 236