pandas apply convert to int differences between np.int, lambda and astype()

Question

Given a df

df = pd.DataFrame(['0', '1', '2', '3'], columns=['a'])

What is the difference between using

 df['b'] = df['a'].apply(np.int)

,

df['b'] = df['a'].apply(lambda x : int(x))

and

df['b'] = df['a'].astype(int)

?

I'm aware that all will give the same result but are there any differences?

Possible duplicate of [Difference between np.int, np.int\_, int, and np.int\_t in cython?](https://stackoverflow.com/questions/21851985/difference-between-np-int-np-int-int-and-np-int-t-in-cython) — Dominique Paul, Oct 21 '18 at 11:23

score 0 · Answer 1 · answered Oct 21 '18 at 11:26

0

np.int is an alias for int.

You can test this by running:

import numpy as np
print(int == np.int)

which will return True.

Also: consider checking out this question which has a very thorough explanation of your question.

answered Oct 21 '18 at 11:26

Dominique Paul

1,623
2
18
31

score 0 · Answer 2 · answered Oct 21 '18 at 12:17

The below uses pandas apply function to iteratively use numpy's int cast which is same as python's int cast. So, both of these are alas the same.

df['b'] = df['a'].apply(np.int)
df['b'] = df['a'].apply(lambda x : int(x))

The astype function however casts an series to specified dtype, here int which for pandas is int64.

df['b'] = df['a'].astype(int)

astype is a vectorized function and I would prefer to use it rather than the apply method due to its poor time complexity as compared to astype.

score 0 · Answer 3 · answered Oct 21 '18 at 12:21

When you use apply it works by looping over the data and changing the dtype of each value to integer. So they are slower when compared to astype

df = pd.DataFrame(pd.np.arange(10**7).reshape(10**4, 10**3)).astype(str)

# Performance
%timeit df[0].apply(np.int)
7.15 ms ± 319 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

%timeit df[0].apply(lambda x : int(x))
9.57 ms ± 405 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Both are almost similar in terms of performance.

Here astype which is function optimized to work faster than apply.

%timeit df[0].astype(int)
1.94 ms ± 96.9 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

And If you are looking for a much much faster approach then we should opt for vectorized approach which numpy arrays can provide.

%timeit df[0].values.astype(np.int)
1.26 ms ± 19.4 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

As you can see the time difference is huge.

pandas apply convert to int differences between np.int, lambda and astype()

3 Answers3