When you use apply
it works by looping over the data and changing the dtype of each value to integer. So they are slower when compared to astype
df = pd.DataFrame(pd.np.arange(10**7).reshape(10**4, 10**3)).astype(str)
# Performance
%timeit df[0].apply(np.int)
7.15 ms ± 319 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit df[0].apply(lambda x : int(x))
9.57 ms ± 405 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Both are almost similar in terms of performance.
Here astype
which is function optimized to work faster than apply.
%timeit df[0].astype(int)
1.94 ms ± 96.9 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
And If you are looking for a much much faster approach then we should opt for vectorized approach which numpy arrays can provide.
%timeit df[0].values.astype(np.int)
1.26 ms ± 19.4 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
As you can see the time difference is huge.