Python - Get text of arguments passed to a function and use it to assign global variables

Question

I've created a function that takes datframe as an argument. I can get different results from the function by changing the name of the dataframe. I want to get the argument that specifies the df to the function to use as text.

def my_func(df, prop=True):
    fd = df.C1.value_counts(normalize=prop).reset_index()
    fd.columns = ['feature','proportion']
    return fd
# Note: I pass this function within another function that builds a graph using fd

import pandas as pd
df1 = pd.DataFrame({'C1':['A','A','B','D']})
df2 = pd.DataFrame({'C1':['C','C','B','D']})

my_func(df2)
#   feature  proportion
# 0       C        0.50
# 1       D        0.25
# 2       B        0.25

Desired Functionality
I want to be able to save fd for the dataframes with the names fd_df1 and fd_df2 right within the function my_func, which I can then call globally. So, I figured that if there was a way to get the text of the arguments passed to a function, then I can use that to create global variables from withn my_func. Like so:

def my_func(df, prop=True):
    fd = df.C1.value_counts(normalize=prop).reset_index()
    fd.columns = ['feature','count','proportion']
    df_name = get_text_of_arg()[0]       # here I want the code that gets 
                                         # 'df1' or 'df2', whatever df is used in function
    global df_name_fd          # create unique name as global variable
    df_name_fd = fd            # save fd with unique name
    return fd

my_func(df2)     # returs fd for df1 and saves it with unique name df1_fd
#   feature  proportion
# 0       C        0.50
# 1       D        0.25
# 2       B        0.25

# Calling the fd for df1
df2_fd
#   feature  proportion
# 0       C        0.50
# 1       D        0.25
# 2       B        0.25

this is answered [here](https://stackoverflow.com/questions/2749796/how-to-get-the-original-variable-name-of-variable-passed-to-a-function) but not recommended in any way. — sim, Jan 27 '21 at 00:49
I've read similar disclaimers elsewhere as well. Why is it not advisable? — , Jan 27 '21 at 01:07

sim · Answer 1 · 2021-01-27T02:18:09.710

Let me preface this answer by saying this should not be done. If you want to have access to the results, then maintain a collection of results. First the solution you asked for but should not use (credits to Ivo Wetzel here for the lookup on the attribute names):

import inspect
import functools
import pandas as pd


def return_df_to_globals(prefix):
    def _return_df_to_globals(f):
        @functools.wraps(f)
        def wrapped(df, *args, **kwargs):
            frame = inspect.currentframe()
            frame = inspect.getouterframes(frame)[1]
            string = inspect.getframeinfo(frame[0]).code_context[0].strip()
            assignments = string[string.find('(') + 1:-1].split(',')
            df_input_name = next(v for k, v in map(lambda a: a.split("="), assignments) if k.strip() == "df")
            ret = f(df, *args, **kwargs)
            globals()["_".join([prefix, df_input_name])] = ret
            return ret
        return wrapped
    return _return_df_to_globals

@return_df_to_globals(prefix="f")
def my_func(df, prop=True):
    fd = df.value_counts(normalize=prop).reset_index()
    fd.columns = ['feature','count','proportion']
    return fd

df1 = pd.DataFrame({'C1':['A','A','B'], 'C2':[10,20,30]})
df2 = pd.DataFrame({'C1':['C','C','B'], 'C2':[100,200,300]})
my_func(prop=True, df=df1)
f_df1  # exists, with return value of the call.

On your question regarding why this is not advisible:

To get to the argument names as needed, you need to inspect frame information from the interpreter stack. It is not intended for such uses and I am sure there will be corner-cases that break above example (maybe somebody else can elaborate).
Separating commands and queries (see command query separation) is generally considered good style and avoids unwanted misconceptions about the system state. Your function both has side-effects (it adds to the global namespace) and returns the results of a query. Fowler's article mentions also valid exceptions to the principle - a cache might also be another good one.
Along the lines of the last point: You could very easily override a name in the global namespace.

You have already made one improvement (providing the name explicitly). To not pollute or otherwise endanger the global namespace, here is a suggestion:

import inspect
import functools

def collec_result(f):
    # check that function does not use parameter names used by cache
    if {"collect_in", "collect_name"}.intersection(inspect.signature(f).parameters.keys()):
        raise ValueError("Error: function signature contains parameters collect_in or collect_name.")
        
    @functools.wraps(f)
    def fun(*args, **kwargs):
        collect_in = kwargs.pop("collect_in", None)
        collect_name = kwargs.pop("collect_name", None)
        ret = f(*args, **kwargs)
        if collect_in is not None and collect_name is not None:
            collect_in[collect_name] = ret
        return ret
    return fun

You can then decorate your functions with collec_result and use collect_in with a dictionary (modifications to SimpleNamespace or similar also possible) and collect_name using the naming strategy you employed also in your solution whenever you wish to write the result also to a dictionary:

results = {}

@collect_result
def foo(a, b):
    return a+b

foo(1, 2, 
    collect_in=results,
    collect_name="123")

results["123"]  # 3

Of course, still better would be to just:

ret = foo(1, 2)
results["my_result"] = ret

Which then means that in whatever local scope (rendering all work above for naught) we could just:

my_result = foo(1, 2)
# or as in your case instead of func(df2, 'df2')
df2 = func(df2)

Then command-query-separation is adhered to. You don't need to silently modify the global namespace and are overall far more fault-resilient than otherwise.

Thanks for the answer. My intent was a) to find some one-liner that would give me the text of the function, b) which I can then use to assign as variable. Why is this not advisable though? — , Jan 27 '21 at 01:31
@gyaan.anveyshak: I have added to my answer, I hope that gives a bit of an intuition - a lot (aside from misusing the interpreter frameinfo, which is just not intended for such uses as in this example) comes down to common pitfalls in achieving maintainable and (hopefully) fault-tolerant code at scale. Here is a discussion on [whether or not global variables are bad](https://stackoverflow.com/questions/484635/are-global-variables-bad). — sim, Jan 27 '21 at 02:05

score 0 · Answer 2 · 2021-01-27T01:32:20.460

Here is a way that takes an extra argument.

def func(df, name=None, prop=True):
    fd = df.C1.value_counts(normalize=prop).reset_index()
    fd.columns = ['feature','proportion']
    if name!=None:
        globals()[name+'_fd'] = fd
    return fd

import pandas as pd
df1 = pd.DataFrame({'C1':['A','A','B','D']})
df2 = pd.DataFrame({'C1':['C','C','B','D']})

func(df2, 'df2')
#   feature  proportion 
# 0       C        0.50 
# 1       B        0.25 
# 2       D        0.25

df2_fd
#   feature  proportion 
# 0       C        0.50 
# 1       B        0.25 
# 2       D        0.25

func(df1)  # will not save separately
#   feature  proportion
# 0       A        0.50
# 1       B        0.25
# 2       D        0.25

df1_fd
# NameError: name 'df1_fd' is not defined

Python - Get text of arguments passed to a function and use it to assign global variables

2 Answers2