Dynamically provide column name to a function via Dataframe.apply()

Question

I have a function "calc" that is being called via an apply() function. Question is, how can I provide the pandas column name dynamically to the calc function as an argument on my apply (instead of explicitly mentioning 'AMOUNT' as in this case)? Thanks.

def calc(row):
    factor = 3
    h_value = int(row['AMOUNT']) // 100
    output = h_value * factor
    return output

df1['BILL_VALUE'] = df1.apply(calc, axis=1)

I provided the function call used in the program for calc, will that not suffice? Basically I have at least three other dataframes in the same program that will need a new column to undergo a calculation defined in calc. Can't seem to find a way to pass the column name into the the function, appreciate your help with this. — Shallunsard, Oct 16 '20 at 10:20

ky_aaaa · Accepted Answer · 2020-10-16T09:42:56.033

0

U can use kwargs parameter: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.apply.html

DataFrame.apply(func, axis=0, raw=False, result_type=None, args=(), **kwds)


def calc(row, param=''):
    factor = 3
    h_value = int(row[param]) // 100
    output = h_value * factor
    return output


df1['BILL_VALUE'] = df1.apply(calc, axis=1, param='AMOUNT')

edited Oct 16 '20 at 09:42

answered Oct 16 '20 at 09:22

ky_aaaa

290
3
10

I'm getting an error with your code: TypeError: calc() takes 2 positional arguments but 21 were given – Shallunsard Oct 16 '20 at 09:45
calling it like this `df1.apply(calc, axis=1, param='AMOUNT')` ? your function is defined correctly like this `def calc(row, param=''):` it is working on my environment. – ky_aaaa Oct 16 '20 at 12:22
Your latest code is working for me. Thanks for helping out. – Shallunsard Oct 16 '20 at 15:28
If possible mark it as answered and up-vote the answer. – ky_aaaa Oct 19 '20 at 07:15

score 0 · Answer 2 · answered Oct 16 '20 at 10:56

You don't require df.apply here.

Perfomance precendence of operations as per this answer:

vectorization
using a custom cython routine
apply a) reductions that can be performed in cython b) iteration in python space
itertuples
iterrows
updating an empty frame (e.g. using loc one-row-at-a-time)

Change your function definition as below and pass appropriate arguments.

def calc(df, input_col, output_col, factor=3):
     df[output_col] = (df[input_col].astype("int") // 100) * factor

Example:

>>> def calc(df, input_col, output_col, factor=3):
...     df[output_col] = (df[input_col].astype("int") // 100) * factor
... 
>>> df1 = pd.DataFrame([1200,100,2425], columns=["AMOUNT"])
>>> df1
   AMOUNT
0    1200
1     100
2    2425
>>> calc(df1, "AMOUNT", "BILL_VALUE")
>>> df1
   AMOUNT  BILL_VALUE
0    1200          36
1     100           3
2    2425          72
>>> df2 = pd.DataFrame([3123,55,420], columns=["AMOUNT"])
>>> calc(df2, "AMOUNT", "BILL_VALUE")
>>> df2
   AMOUNT  BILL_VALUE
0    3123          93
1      55           0
2     420          12

Reference:

Does pandas iterrows have performance issues?

Dynamically provide column name to a function via Dataframe.apply()

2 Answers2