You don't require df.apply
here.
Perfomance precendence of operations as per this answer:
- vectorization
- using a custom cython routine
- apply
a) reductions that can be performed in cython
b) iteration in python space
- itertuples
- iterrows
- updating an empty frame (e.g. using loc one-row-at-a-time)
Change your function definition as below and pass appropriate arguments.
def calc(df, input_col, output_col, factor=3):
df[output_col] = (df[input_col].astype("int") // 100) * factor
Example:
>>> def calc(df, input_col, output_col, factor=3):
... df[output_col] = (df[input_col].astype("int") // 100) * factor
...
>>> df1 = pd.DataFrame([1200,100,2425], columns=["AMOUNT"])
>>> df1
AMOUNT
0 1200
1 100
2 2425
>>> calc(df1, "AMOUNT", "BILL_VALUE")
>>> df1
AMOUNT BILL_VALUE
0 1200 36
1 100 3
2 2425 72
>>> df2 = pd.DataFrame([3123,55,420], columns=["AMOUNT"])
>>> calc(df2, "AMOUNT", "BILL_VALUE")
>>> df2
AMOUNT BILL_VALUE
0 3123 93
1 55 0
2 420 12
Reference:
Does pandas iterrows have performance issues?