1

I would like to define a function make_function that returns a new function. It takes as arguments a list arg_names of argument names for the new function and a function inner_func to be used in the definition of the new function. The new function will just add 5 to the output of inner_func in this simple case.

I tried using eval():

def make_function(args_names: List[str], inner_func: Callable):
    args_str = str.join(", ", args_names)
    expr: str = (
        "lambda "
        + args_str
        + ": "
        + inner_func.__name__
        + "("
        + args_str
        + ") + 5"
    )

    return eval(expr)

This works. But using eval is not recommended. For once, it is not easy to debug. Also, in my use case I need to call the new function from a place where inner_func is not available, which raises an error. Any other options?

Soap
  • 309
  • 2
  • 14
  • It's possible, but it's really difficult. The problem is trivial if you use `*args`, so, why not just use `*args`? Does it really matter what the argument names are, if the only thing you will use them for is to pass them all, in the same order, into the `inner_func`? – wim Nov 03 '22 at 01:09
  • There is also the operator module that can compose operations from constitutents. See https://docs.python.org/3/library/operator.html#operator.add for example. And also [ast.literal_eval](https://docs.python.org/3/library/ast.html#ast.literal_eval). Not sure if it fits your use case. `eval` fits blackhats' use cases perfectly well however: it is a massive security hole. – JL Peyret Nov 03 '22 at 03:36
  • 1
    @wim Unfortunately the exact names of the arguments do matter: there is a function of a package I'm using which extracts the names of the arguments and builds a graph with those names as nodes (the details don't matter here). – Soap Nov 03 '22 at 09:42

1 Answers1

1

Perhaps exec is actually the best approach. eval/exec are usually not recommended if the input string is arbitrary/untrusted. However, when the code string which is eventually passed to the compiler was also generated by you, and not directly supplied by user, then it can be fine. There are some stdlib examples using this approach: namedtuple uses eval and dataclass uses exec, nobody has figured out how to exploit them (yet).

Now, I think that in your case it's fairly easy to do a safe code generation + exec simply by verifying that the args_names passed in is truly a list of arg names, and not some arbitrary Python code.

from textwrap import dedent
from typing import List, Callable

def make_function(args_names: List[str], inner_func: Callable):

    for arg_name in args_names:
        if not arg_name.isidentifier():
            raise Exception(f"Invalid arg name: {arg_name}")

    args = ', '.join(args_names)

    code_str = dedent(f"""\
        def outer_func({args}):
            return inner_func({args}) + 5
    """)

    scope = {"inner_func": inner_func}
    exec(code_str, scope)
    return scope["outer_func"]

Demo:

>>> def orig(a, b):
...     return a + b + 1
... 
>>> func = make_function(args_names=["foo", "bar"], inner_func=orig)
>>> func(2, 3)
11
>>> func(foo=2, bar=3)
11
>>> func(foo=2, bar=3, baz=4)
TypeError: outer_func() got an unexpected keyword argument 'baz'
>>> func(foo=2)
TypeError: outer_func() missing 1 required positional argument: 'bar'

As desired, it continues to work even when a local reference to the inner_func is no longer available, since we made sure the reference was available during code gen:

>>> del orig
>>> func(foo=2, bar=3)
11

Nefarious "argument names" are not allowed:

>>> make_function(["foo", "bar", "__import__('os')"], orig)
Exception: Invalid arg name: __import__('os')

For an approach without using code generation, it is also possible to instantiate types.FunctionType directly. To do this you need to pass it a types.CodeType instance, which are pretty difficult to create manually. These are public/documented, but the docstring for the code type even tries to scare you away:

>>> ((lambda: None).__code__.__doc__)
'Create a code object.  Not for the faint of heart.'

If you want to attempt it regardless, see How to create a code object in python? but I think you'll find that using eval, exec or compile is more convincing.

wim
  • 338,267
  • 99
  • 616
  • 750
  • Thanks for a great, complete answer. One question: I find it surprising that it still works after you delete `orig`. If I do the same with my code it doesn't work, as I said in the question statement. Does this have to do with `eval` vs `exec` or something else? – Soap Nov 03 '22 at 11:08
  • 1
    The difference is that you use `inner_func.__name__` in the lambda, so you're relying on the name binding. That name doesn't necessarily hang around. I'm using `scope = {"inner_func": inner_func}` so I pass the `inner_func` function instance itself. You can actually fix up the lambda approach by passing the second argument when you call `eval`, so that you are in control of the name lookup, but I think that using `exec` is cleaner and easier to understand here. – wim Nov 03 '22 at 17:03
  • 1
    Note that using `inner_func.__name__` also creates another unnecessary restriction: you can't pass in a lambda as the `inner_func` callable, it's an "anonymous" function and you will get a syntax error in the `eval` (it doesn't have a usable `__name__` attribute). So use instead something like `eval(..., {"_inner_func": inner_func})` to control the name lookup in your version. – wim Nov 03 '22 at 17:28