-3

Use case - I am taking python code created in another system, and breaking it up into individual functions and connecting them together. The entire point of this work is to break up large python functions that we did not write into smaller python functions for many business reasons.

I COULD take the code, parse for variables, and arbitrarily put them in a dict when doing this, but that is more than a teeny bit of work, and I'd like to run this to ground before I do.

I understand we should almost never but I need to because I am code generating wrappers for functions I did not write, I need to dynamically create variables inside a function. I also can't use exec because the value could be a complex structure (e.g., a dict).

So, the point of what we're doing is to ask the original authors to make no changes to the incoming code while still executing it across several independent entities.

Just like in the example listed here - we're capturing as much state as we can with the first exit (ideally functions, lambdas and all variables), and re-instating them in the second function so that two functions which formerly had the same scope and context can execute with no changes.

Here is a single block of reproducible code (everything not related to b is code that I can use to wrap the assignment:

Original:

def original_function():
    b = 100
    b = b + 20

Resulting generated function:

def fun_1() -> str:
    import dill
    from base64 import urlsafe_b64decode, urlsafe_b64encode
    from types import ModuleType

    b = 100

    locals_keys = frozenset(locals().keys())
    global_keys = frozenset(globals().keys())
    __context_export = {}
    for val in locals_keys:
        if not val.startswith("_") and not isinstance(val, ModuleType):
            __context_export[val] = dill.dumps(locals()[val])

    for val in global_keys:
        if not val.startswith("_") and not isinstance(val, ModuleType):
            __context_export[val] = dill.dumps(globals()[val])

    b64_string = str(urlsafe_b64encode(dill.dumps(__context_export)), encoding="ascii")

    from collections import namedtuple

    output = namedtuple("FuncOutput", ["context"])
    return output(b64_string)


def fun_2(context):
    import dill
    from base64 import urlsafe_b64encode, urlsafe_b64decode
    from types import ModuleType

    __base64_decode = urlsafe_b64decode(context)
    __context_import_dict = dill.loads(__base64_decode)
    for k in __context_import_dict:
        val = dill.loads(__context_import_dict[k])
        if globals().get(k) is None and not isinstance(val, ModuleType):
            globals()[k] = val

    b = b + 20


output = fun_1()
fun_2(output[0])

The error I get when I run this is:

UnboundLocalError: local variable 'b' referenced before assignment

Thank you all for the help!

aronchick
  • 6,786
  • 9
  • 48
  • 75
  • 7
    It might be useful to explain what you mean by "code generating wrappers for functions you did not write", because you might be falling into the [XY problem](https://meta.stackexchange.com/a/66378/). Also you need to clarify what didn't work in your solution 2, please demonstrate using a [minimal and reproducible example](https://stackoverflow.com/help/mcve). – metatoaster May 13 '21 at 01:46
  • Also if you are really trying to set a global variable in a different module, you might want to refer [to this answer](https://stackoverflow.com/a/15959638/), and also [this thread](https://stackoverflow.com/a/46023970/). If your "location #2" is in fact the imported module `dill`, you only need to set `dill.b = 25`. – metatoaster May 13 '21 at 01:52
  • 4
    Why can't you store a dict inside another one? `globals()` is a dict after all. Can you give an example of the functions you did not write and the wrapper API you envisioned? – Selcuk May 13 '21 at 01:53
  • 2
    Please justify your claim that you "need to dynamically create variables inside a function" — as it sounds very "iffy"… – martineau May 13 '21 at 02:07
  • It is *impossible* (outside of deeply hacking the runtime) to dynamically create a local variable in a CPython function. You really need to give us some details about what you are trying to do exactly. Note, the code you posted *should* work, btw. – juanpa.arrivillaga May 13 '21 at 02:09
  • Ok, added some more context - please let me know if you have any more questions1 – aronchick May 13 '21 at 03:06
  • 3
    *"The entire point of this work is to break up large python functions that we did not write into smaller python functions for many business reasons."* So if those smaller functions are functions that you (or your colleagues) are writing, then you should give them parameters so that they can receive whatever data they need, instead of them getting that data via global variables. – kaya3 May 13 '21 at 03:11
  • 2
    That's not what Postel's law means at all. – user2357112 May 13 '21 at 03:29
  • 2
    Being liberal in what you accept is very different from "have the original code execute with no changes". You say you're doing code generation. Changing the original code doesn't mean you've restricted the inputs you can accept; it means you've *done code generation*. – user2357112 May 13 '21 at 03:33
  • If all you are after is injecting some variable into a function, [this package](https://github.com/objcode/python-inject) may point you towards what you are after. – metatoaster May 13 '21 at 03:36
  • 1
    Question: in your actual working example (that produced the `UnboundLocalError`), does `file1.main()` return `context` and that is then passed to `file2.main(...)`? If so, `file2.main` could start with `vars(sys.modules[__name__]).update(context)` (lifted directly from [this comment](https://stackoverflow.com/questions/5036700/h#comment96489690_16921809) from your first link) to bring assignments from context into that module. Most certainly may lead to other side effects (like possibly overriding imports in the `file2` module) so I honestly cannot recommend this, but if that's what you want.. – metatoaster May 13 '21 at 04:46
  • This makes sense, and I can see the variable in the vars() dict afterwards, but I'm still getting this error. Is there a global name lookup that i need to update after doing this? – aronchick May 13 '21 at 17:31
  • You again did not include the full code context nor the traceback, it makes it extremely difficult to infer what exactly you are doing or what you are currently after. I would guess you may be dealing with similar issue to [this thread](https://stackoverflow.com/questions/2609518/unboundlocalerror-with-nested-function-scopes). Anyway, if a design goal is fraught with such issues it may be that the design goal is flawed, or that Python is not a suitable language/runtime for the design at hand. – metatoaster May 14 '21 at 07:43
  • My apologies - I've now broken it down into a single block of code that can reproduce the effect and what I'm looking for - my goal is to continue using 'b' and NOT move to a dict. – aronchick May 14 '21 at 16:03

1 Answers1

0

Ok, this was a pretty easy solve after I understood the issues. To be honest, this makes even more sense - since I'm getting the code from externally (as a string), it makes sense that I should mount in the necessary global variables and exec inside closed environment.

TO BE CLEAR - this is executing inside the USER'S environment, so security is not an issue. But this works!

def fun_1() -> str:
    import dill
    from base64 import urlsafe_b64decode, urlsafe_b64encode
    from types import ModuleType, FunctionType

    # CODE FROM EXTERNAL

    b = 100

    # END CODE

    locals_keys = frozenset(locals().keys())
    global_keys = frozenset(globals().keys())
    __context_export = {}
    for val in locals_keys:
        if (
            not val.startswith("_")
            and not isinstance(val, ModuleType)
            and not isinstance(val, FunctionType)
        ):
            __context_export[val] = dill.dumps(locals()[val])

    for val in global_keys:
        if (
            not val.startswith("_")
            and not isinstance(val, ModuleType)
            and not isinstance(val, FunctionType)
        ):
            __context_export[val] = dill.dumps(globals()[val])

    b64_string = str(urlsafe_b64encode(dill.dumps(__context_export)), encoding="ascii")

    from collections import namedtuple

    output = namedtuple("FuncOutput", ["context"])
    return output(b64_string)


def fun_2(context):
    import dill
    from base64 import urlsafe_b64encode, urlsafe_b64decode
    from types import ModuleType, FunctionType
    from pprint import pprint as pp

    __base64_decode = urlsafe_b64decode(context)
    __context_import_dict = dill.loads(__base64_decode)

    variables = {}
    for k in __context_import_dict:
        variables[k] = dill.loads(__context_import_dict[k])

    loc = {}

    # CODE FROM EXTERNAL

    inner_code_to_execute "b = b + 20"

    # END CODE 

    exec(inner_code_to_execute, variables, loc)
    print(loc["b"])


output = fun_1()
fun_2(output[0])
aronchick
  • 6,786
  • 9
  • 48
  • 75