Let me preface this answer by saying this should not be done. If you want to have access to the results, then maintain a collection of results. First the solution you asked for but should not use (credits to Ivo Wetzel here for the lookup on the attribute names):
import inspect
import functools
import pandas as pd
def return_df_to_globals(prefix):
def _return_df_to_globals(f):
@functools.wraps(f)
def wrapped(df, *args, **kwargs):
frame = inspect.currentframe()
frame = inspect.getouterframes(frame)[1]
string = inspect.getframeinfo(frame[0]).code_context[0].strip()
assignments = string[string.find('(') + 1:-1].split(',')
df_input_name = next(v for k, v in map(lambda a: a.split("="), assignments) if k.strip() == "df")
ret = f(df, *args, **kwargs)
globals()["_".join([prefix, df_input_name])] = ret
return ret
return wrapped
return _return_df_to_globals
@return_df_to_globals(prefix="f")
def my_func(df, prop=True):
fd = df.value_counts(normalize=prop).reset_index()
fd.columns = ['feature','count','proportion']
return fd
df1 = pd.DataFrame({'C1':['A','A','B'], 'C2':[10,20,30]})
df2 = pd.DataFrame({'C1':['C','C','B'], 'C2':[100,200,300]})
my_func(prop=True, df=df1)
f_df1 # exists, with return value of the call.
On your question regarding why this is not advisible:
- To get to the argument names as needed, you need to inspect frame information from the interpreter stack. It is not intended for such uses and I am sure there will be corner-cases that break above example (maybe somebody else can elaborate).
- Separating commands and queries (see command query separation) is generally considered good style and avoids unwanted misconceptions about the system state. Your function both has side-effects (it adds to the global namespace) and returns the results of a query. Fowler's article mentions also valid exceptions to the principle - a cache might also be another good one.
- Along the lines of the last point: You could very easily override a name in the global namespace.
You have already made one improvement (providing the name explicitly). To not pollute or otherwise endanger the global namespace, here is a suggestion:
import inspect
import functools
def collec_result(f):
# check that function does not use parameter names used by cache
if {"collect_in", "collect_name"}.intersection(inspect.signature(f).parameters.keys()):
raise ValueError("Error: function signature contains parameters collect_in or collect_name.")
@functools.wraps(f)
def fun(*args, **kwargs):
collect_in = kwargs.pop("collect_in", None)
collect_name = kwargs.pop("collect_name", None)
ret = f(*args, **kwargs)
if collect_in is not None and collect_name is not None:
collect_in[collect_name] = ret
return ret
return fun
You can then decorate your functions with collec_result
and use collect_in
with a dictionary (modifications to SimpleNamespace
or similar also possible) and collect_name
using the naming strategy you employed also in your solution whenever you wish to write the result also to a dictionary:
results = {}
@collect_result
def foo(a, b):
return a+b
foo(1, 2,
collect_in=results,
collect_name="123")
results["123"] # 3
Of course, still better would be to just:
ret = foo(1, 2)
results["my_result"] = ret
Which then means that in whatever local scope (rendering all work above for naught) we could just:
my_result = foo(1, 2)
# or as in your case instead of func(df2, 'df2')
df2 = func(df2)
Then command-query-separation is adhered to. You don't need to silently modify the global namespace and are overall far more fault-resilient than otherwise.