Doing it your way is slower.
I used the following dataframe
import string
df = pd.DataFrame({key:range(0, 10000) for key in string.ascii_lowercase})
Then in Jupyter notebook I used the %%timeit cell magic to test how much time it takes to run the following pieces of code:
%%timeit
def f(df):
df = df[['a', 'b']].copy()
df['New Column'] = df['a'] * df['b']
return df['New Column'].astype(int)
df['New Column'] = f(df)
this took 1.05 ms ± 20.3 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each).
Whereas the following piece of code
%%timeit
df['New Column'] = df['a'] * df['b']
df['New Column'].astype(int)
only took
%%timeit
df['New Column'] = df['a'] * df['b']
df['New Column'].astype(int)
340 µs ± 2.46 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
About 3 times faster!
These two pieces of code are not completly functionally the same (the second piece of code does not have the copy the first one has) when we add a copy into the second piece of code resluting in this:
%%timeit
df2 = df[['a','b']].copy()
df['New Column'] = df2['a'] * df2['b']
df['New Column'].astype(int)
Which runs in 820 µs ± 1.89 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
Less dramatic difference but still faster than your method.
Adding the copy more than doubled the time it takes to compute!
So what happens when we move the definition of the function you use outside of the timing:
we run this piece of code first:
import string
df = pd.DataFrame({key:range(0, 10000) for key in string.ascii_lowercase})
def f(df):
df = df[['a', 'b']].copy()
df['New Column'] = df['a'] * df['b']
return df['New Column'].astype(int)
and then this piece of code:
%%timeit
df['New Column'] = f(df)
this runs in 1.03 ms ± 2.02 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each). Still slower than not using your function.
note: Runtimes are dependent on the specs of your computer and other background task running. Mileage may vary.
This was the objective hard facts based part of my answer.
As to the question if your method is the preferred one. I would say no.
1) the way you do it in your example function f is very hard coded.
Your function f only works for column a and b. What if want to do it on column n and m? currently you would need to write a completely new function to process those columns. It would be better if you if you want to stick to this general method to make your code more general. TLDR do not hardcode.
2) you make a copy of the dataframe which does not speed up the final calculations (it does require extra resources though, resulting in extra overhead).