While using pandas apply() how to return the name of the function in a column?

Question

Assume the following pandas dataframe:

And the following function:

def sum(A):
    return 2 + A

I am applying sum function to the pandas dataframe as follows:

df['sum'] = df['A'].apply(sum)

How can return the name of the function in another column? For instance the expected output would look like this:

A sum func_name
1   3   sum
1   3   sum
2   4   sum
4   6   sum
10  12  sum

The reason is that I would like to keep track on what applied on each value.

Well, it is already in your columns name. If you iterate over functions, you can get their name: `for func in funcs: df[func.__name__] = df['A'].apply(func)`. — Graipher, Dec 18 '18 at 16:49
please, please, avoid overwriting python builtin functions like `sum` — Tarifazo, Dec 18 '18 at 16:54
What happens if the function is anonymous? (`df['A'].apply(lambda a: 2 + a)`) — ernest_k, Dec 18 '18 at 17:19
@ernest_k, This is *exactly* the reason why an **explicit** dictionary-based mapping should be preferred. Tried to explain this in my answer. — jpp, Dec 20 '18 at 16:19

BENY · Answer 1 · 2018-12-18T16:57:44.450

2

You may check inspect

import inspect

def SUM(A):
    return pd.Series([2 + A,  inspect.stack()[0][3]],index=['value','func_name'])
df['A'].apply(SUM)
Out[5]: 
   value func_name
0      3       SUM
1      3       SUM
2      4       SUM
3      6       SUM
4     12       SUM

edited Dec 18 '18 at 16:57

answered Dec 18 '18 at 16:52

BENY

317,841
20
164
234

cs95 · Accepted Answer · 2018-12-18T16:58:57.143

If you want to get the function name, another option is using f.__name__. Example:

def mysum(X):
    return 2 + X

def foo(X, function):
    return pd.Series({
        function.__name__: function(X), 'func_name': function.__name__})

df.join(df.A.apply(foo, function=mysum))

    A  mysum func_name
0   1      3     mysum
1   1      3     mysum
2   2      4     mysum
3   4      6     mysum
4  10     12     mysum

def myprod(X):
    return 2 * X    

df.join(df.A.apply(foo, function=myprod))

    A  myprod func_name
0   1       2    myprod
1   1       2    myprod
2   2       4    myprod
3   4       8    myprod
4  10      20    myprod

I assume you are already familiar with the pitfalls of using apply this way. I've written this under the assumption that your function is a stand-in for something a lot more complex. But in general, you should try to vectorize where possible.

If you want more flexibility naming the output column, you can add a keyword argument name:

def foo(X, function, name=None):
    name = name if name else function.__name__
    return pd.Series({
        name: function(X), 'func_name': function.__name__})

df.join(df.A.apply(foo, function=mysum, name='sum'))

    A  sum func_name
0   1    3     mysum
1   1    3     mysum
2   2    4     mysum
3   4    6     mysum
4  10   12     mysum

jpp · Answer 3 · 2018-12-18T17:08:52.167

If you need to use the name of your function, use a dictionary as a dispatcher. This is clean and reliable. It avoids having to shadow the built-in sum function, which is not recommended.

def summer(A):
    return 2 + A

def apply_func(s, func):
    d = {'sum': summer}
    return s.apply(d[func]), func

df['sum'], df['func_name'] = apply_func(df['A'], 'sum')

print(df)

    A  sum func_name
0   1    3       sum
1   1    3       sum
2   2    4       sum
3   4    6       sum
4  10   12       sum

With Pandas, you should avoid pd.Series.apply, as this represents an inefficient Python-level loop. In this case, your function can be trivially vectorised by redefining apply_func:

def apply_func(s, func):
    d = {'sum': summer}
    return d[func](s), func

While using pandas apply() how to return the name of the function in a column?

3 Answers3