0

I have a function "calc" that is being called via an apply() function. Question is, how can I provide the pandas column name dynamically to the calc function as an argument on my apply (instead of explicitly mentioning 'AMOUNT' as in this case)? Thanks.

def calc(row):
    factor = 3
    h_value = int(row['AMOUNT']) // 100
    output = h_value * factor
    return output
df1['BILL_VALUE'] = df1.apply(calc, axis=1) 
Shallunsard
  • 55
  • 1
  • 7
  • I provided the function call used in the program for calc, will that not suffice? Basically I have at least three other dataframes in the same program that will need a new column to undergo a calculation defined in calc. Can't seem to find a way to pass the column name into the the function, appreciate your help with this. – Shallunsard Oct 16 '20 at 10:20

2 Answers2

0

U can use kwargs parameter: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.apply.html

DataFrame.apply(func, axis=0, raw=False, result_type=None, args=(), **kwds)


def calc(row, param=''):
    factor = 3
    h_value = int(row[param]) // 100
    output = h_value * factor
    return output


df1['BILL_VALUE'] = df1.apply(calc, axis=1, param='AMOUNT') 

ky_aaaa
  • 290
  • 3
  • 10
0

You don't require df.apply here.

Perfomance precendence of operations as per this answer:

  1. vectorization
  2. using a custom cython routine
  3. apply a) reductions that can be performed in cython b) iteration in python space
  4. itertuples
  5. iterrows
  6. updating an empty frame (e.g. using loc one-row-at-a-time)

Change your function definition as below and pass appropriate arguments.

def calc(df, input_col, output_col, factor=3):
     df[output_col] = (df[input_col].astype("int") // 100) * factor

Example:

>>> def calc(df, input_col, output_col, factor=3):
...     df[output_col] = (df[input_col].astype("int") // 100) * factor
... 
>>> df1 = pd.DataFrame([1200,100,2425], columns=["AMOUNT"])
>>> df1
   AMOUNT
0    1200
1     100
2    2425
>>> calc(df1, "AMOUNT", "BILL_VALUE")
>>> df1
   AMOUNT  BILL_VALUE
0    1200          36
1     100           3
2    2425          72
>>> df2 = pd.DataFrame([3123,55,420], columns=["AMOUNT"])
>>> calc(df2, "AMOUNT", "BILL_VALUE")
>>> df2
   AMOUNT  BILL_VALUE
0    3123          93
1      55           0
2     420          12

Reference:

Does pandas iterrows have performance issues?

รยקคгรђשค
  • 1,919
  • 1
  • 10
  • 18