11

I am trying to analyze some messy code, that happens to use global variables quite heavily within functions (I am trying to refactor the code so that functions only use local variables). Is there any way to detect global variables within a function?

For example:

def f(x):
    x = x + 1
    z = x + y
    return z

Here the global variable is y since it isn't given as an argument, and neither is it created within the function.

I tried to detect global variables within the function using string parsing, but it was getting a bit messy; I was wondering if there was a better way to do this?

Edit: If anyone is interested this is the code I am using to detect global variables (based on kindall's answer and Paolo's answer to this question: Capture stdout from a script in Python):

from dis import dis

def capture(f):
    """
    Decorator to capture standard output
    """
    def captured(*args, **kwargs):
        import sys
        from cStringIO import StringIO

        # setup the environment
        backup = sys.stdout

        try:
            sys.stdout = StringIO()     # capture output
            f(*args, **kwargs)
            out = sys.stdout.getvalue() # release output
        finally:
            sys.stdout.close()  # close the stream 
            sys.stdout = backup # restore original stdout

        return out # captured output wrapped in a string

    return captured

def return_globals(f):
    """
    Prints all of the global variables in function f
    """
    x = dis_(f)
    for i in x.splitlines():
        if "LOAD_GLOBAL" in i:
            print i

dis_ = capture(dis)

dis_(f)

dis by default does not return output, so if you want to manipulate the output of dis as a string, you have to use the capture decorator written by Paolo and posted here: Capture stdout from a script in Python

Community
  • 1
  • 1
applecider
  • 2,311
  • 4
  • 19
  • 35
  • As it happens I also wrote a way to capture stdout. :-) http://stackoverflow.com/a/16571630/416467 – kindall Oct 17 '15 at 00:16

2 Answers2

9

Inspect the bytecode.

from dis import dis
dis(f)

Result:

  2           0 LOAD_FAST                0 (x)
              3 LOAD_CONST               1 (1)
              6 BINARY_ADD
              7 STORE_FAST               0 (x)

  3          10 LOAD_FAST                0 (x)
             13 LOAD_GLOBAL              0 (y)
             16 BINARY_ADD
             17 STORE_FAST               1 (z)

  4          20 LOAD_FAST                1 (z)
             23 RETURN_VALUE

The global variables will have a LOAD_GLOBAL opcode instead of LOAD_FAST. (If the function changes any global variables, there will be STORE_GLOBAL opcodes as well.)

With a little work, you could even write a function that scans the bytecode of a function and returns a list of the global variables it uses. In fact:

from dis import HAVE_ARGUMENT, opmap

def getglobals(func):
    GLOBAL_OPS = opmap["LOAD_GLOBAL"], opmap["STORE_GLOBAL"]
    EXTENDED_ARG = opmap["EXTENDED_ARG"]

    func = getattr(func, "im_func", func)
    code = func.func_code
    names = code.co_names

    op = (ord(c) for c in code.co_code)
    globs = set()
    extarg = 0

    for c in op:
        if c in GLOBAL_OPS:
            globs.add(names[next(op) + next(op) * 256 + extarg])
        elif c == EXTENDED_ARG:
            extarg = (next(op) + next(op) * 256) * 65536
            continue
        elif c >= HAVE_ARGUMENT:
            next(op)
            next(op)

        extarg = 0

    return sorted(globs)

print getglobals(f)               # ['y']
kindall
  • 178,883
  • 35
  • 278
  • 309
  • what are your thoughts on using print(globals())? – idjaw Oct 16 '15 at 00:47
  • 1
    That depends a lot on state, i.e., which global variables have been defined by the particular sequence of function calls you've done (assuming some of the functions set globals). `dis` is safer because the Python parser has already decided which variables are locals when it generates the bytecode, so it knows which must be globals even if they haven't been defined yet. – kindall Oct 16 '15 at 00:49
  • 1
    Brilliant! That was the short sweet pythonic answer I was looking for. `dis` looks like a really cool library, I'll have to dig into that later. @idjaw `print(globals())` would print all globals in the script and not just those within a function of interest. – applecider Oct 16 '15 at 00:51
  • Thanks for the explanation. That makes sense. Cheers! – idjaw Oct 16 '15 at 00:51
  • @kindall I added a full solution to the question that returns only the global variables in a function (more specifically prints only the rows of `dis`'s output that correspond to global variables). Just curious, what is `dis` typically used for? Are there other obvious practical applications? – applecider Oct 16 '15 at 01:16
  • 1
    @applecider: Updated my answer with a function that scans the bytecode of a function and returns a list of the global variables it uses. – kindall Oct 16 '15 at 17:15
  • 1
    @applecider: As for what `dis` is useful for, mainly it's handy to see what primitive instructions Python turned your code into, or comparing two ways of doing something to see which uses fewer primitive operations. – kindall Oct 16 '15 at 17:23
4

As mentioned in the LOAD_GLOBAL documentation:

LOAD_GLOBAL(namei)

Loads the global named co_names[namei] onto the stack.

This means you can inspect the code object for your function to find globals:

>>> f.__code__.co_names
('y',)

Note that this isn't sufficient for nested functions (nor is the dis.dis method in @kindall's answer). In that case, you will need to look at constants too:

# Define a function containing a nested function
>>> def foo():
...    def bar():
...        return some_global

# It doesn't contain LOAD_GLOBAL, so .co_names is empty.
>>> dis.dis(foo)
  2           0 LOAD_CONST               1 (<code object bar at 0x2b70440c84b0, file "<ipython-input-106-77ead3dc3fb7>", line 2>)
              3 MAKE_FUNCTION            0
              6 STORE_FAST               0 (bar)
              9 LOAD_CONST               0 (None)
             12 RETURN_VALUE

# Instead, we need to walk the constants to find nested functions:
# (if bar contain a nested function too, we'd need to recurse)
>>> from types import CodeType
>>> for constant in foo.__code__.co_consts:
...     if isinstance(constant, CodeType):
...         print constant.co_names
('some_global',)
Wilfred Hughes
  • 29,846
  • 15
  • 139
  • 192