Why is pandas faster then numpy on simple mathematical operations?

Question

Recently, I observed that pandas is faster on multiplications. I show you this in an example below. How is this possible on such simple operations? How is this possible at all? The underlying data container within pandas dataframes are numpy arrays.

Measurements

I use arrays/dataframes with shapes (10k, 10k).

import numpy as np
import pandas as pd

a = np.random.randn(10000, 10000)
d = pd.DataFrame(a.copy())

a.shape

(10000, 10000)

d.shape

(10000, 10000)

%%timeit
d * d

53.2 ms ± 333 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

%%timeit
a * a

318 ms ± 12.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Observations

pandas is about five to six times faster than numpy to evaluate this simple multiplication. How can this be?

Does this answer your question? [Numpy / Pandas optimized vector operations](https://stackoverflow.com/questions/55303847/numpy-pandas-optimized-vector-operations) — Joe, Jun 17 '20 at 08:38
https://stackoverflow.com/questions/17390886/how-to-speed-up-pandas-multilevel-dataframe-sum — Joe, Jun 17 '20 at 08:39

score 2 · Answer 1 · answered Jun 17 '20 at 08:25

Pandas uses `numexpr` behind the scenes

Pandas uses numexpr under the hood if it is installed. This is true in my case. If I use numexpr explicitly I get the following.

Measurement

With numexpr.evaluate a 'valid' numerical expression on numpy.ndarrays can be evaluated.

import numexpr

%%timeit
numexpr.evaluate('a * a')

52.7 ms ± 398 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

Observations

The wall time for evaluating the product of an array with itself is now plus minus the same as the one needed by pandas.

Conclusion

There can be cases where pandas is faster then numpy alone. On the other hand, by using numexpr together with numpy one can get the same speedup. But you need to do it 'your own'. Additionally, this here is not a normal use case for pandas. Usually one has a dataframe with an Index or a MultiIndex (Hierarchical Index) attached on at least one axis. Multiplying dataframes with not equal MultiIndex (broadcasting) for example, needs to be investigated.

Why is pandas faster then numpy on simple mathematical operations?

Measurements

Observations

1 Answers1

Pandas uses `numexpr` behind the scenes

Measurement

Observations

Conclusion

Linked

Why is pandas faster then numpy on simple mathematical operations?

Measurements

Observations

1 Answers1

Pandas uses numexpr behind the scenes

Measurement

Observations

Conclusion

Linked

Pandas uses `numexpr` behind the scenes