2

Assume the following pandas dataframe:

A
1
1
2
4
10

And the following function:

def sum(A):
    return 2 + A

I am applying sum function to the pandas dataframe as follows:

df['sum'] = df['A'].apply(sum)

How can return the name of the function in another column? For instance the expected output would look like this:

A sum func_name
1   3   sum
1   3   sum
2   4   sum
4   6   sum
10  12  sum

The reason is that I would like to keep track on what applied on each value.

anon
  • 836
  • 2
  • 9
  • 25
  • Well, it is already in your columns name. If you iterate over functions, you can get their name: `for func in funcs: df[func.__name__] = df['A'].apply(func)`. – Graipher Dec 18 '18 at 16:49
  • 1
    please, please, avoid overwriting python builtin functions like `sum` – Tarifazo Dec 18 '18 at 16:54
  • What happens if the function is anonymous? (`df['A'].apply(lambda a: 2 + a)`) – ernest_k Dec 18 '18 at 17:19
  • 1
    @ernest_k, This is *exactly* the reason why an **explicit** dictionary-based mapping should be preferred. Tried to explain this in my answer. – jpp Dec 20 '18 at 16:19

3 Answers3

2

You may check inspect

import inspect

def SUM(A):
    return pd.Series([2 + A,  inspect.stack()[0][3]],index=['value','func_name'])
df['A'].apply(SUM)
Out[5]: 
   value func_name
0      3       SUM
1      3       SUM
2      4       SUM
3      6       SUM
4     12       SUM
BENY
  • 317,841
  • 20
  • 164
  • 234
2

If you want to get the function name, another option is using f.__name__. Example:

def mysum(X):
    return 2 + X

def foo(X, function):
    return pd.Series({
        function.__name__: function(X), 'func_name': function.__name__})

df.join(df.A.apply(foo, function=mysum))

    A  mysum func_name
0   1      3     mysum
1   1      3     mysum
2   2      4     mysum
3   4      6     mysum
4  10     12     mysum

def myprod(X):
    return 2 * X    

df.join(df.A.apply(foo, function=myprod))

    A  myprod func_name
0   1       2    myprod
1   1       2    myprod
2   2       4    myprod
3   4       8    myprod
4  10      20    myprod

I assume you are already familiar with the pitfalls of using apply this way. I've written this under the assumption that your function is a stand-in for something a lot more complex. But in general, you should try to vectorize where possible.


If you want more flexibility naming the output column, you can add a keyword argument name:

def foo(X, function, name=None):
    name = name if name else function.__name__
    return pd.Series({
        name: function(X), 'func_name': function.__name__})

df.join(df.A.apply(foo, function=mysum, name='sum'))

    A  sum func_name
0   1    3     mysum
1   1    3     mysum
2   2    4     mysum
3   4    6     mysum
4  10   12     mysum
cs95
  • 379,657
  • 97
  • 704
  • 746
1

If you need to use the name of your function, use a dictionary as a dispatcher. This is clean and reliable. It avoids having to shadow the built-in sum function, which is not recommended.

def summer(A):
    return 2 + A

def apply_func(s, func):
    d = {'sum': summer}
    return s.apply(d[func]), func

df['sum'], df['func_name'] = apply_func(df['A'], 'sum')

print(df)

    A  sum func_name
0   1    3       sum
1   1    3       sum
2   2    4       sum
3   4    6       sum
4  10   12       sum

With Pandas, you should avoid pd.Series.apply, as this represents an inefficient Python-level loop. In this case, your function can be trivially vectorised by redefining apply_func:

def apply_func(s, func):
    d = {'sum': summer}
    return d[func](s), func
jpp
  • 159,742
  • 34
  • 281
  • 339