Recently, I observed that pandas is faster on multiplications. I show you this in an example below. How is this possible on such simple operations? How is this possible at all? The underlying data container within pandas dataframes are numpy arrays.
Measurements
I use arrays/dataframes with shapes (10k, 10k).
import numpy as np
import pandas as pd
a = np.random.randn(10000, 10000)
d = pd.DataFrame(a.copy())
a.shape
(10000, 10000)
d.shape
(10000, 10000)
%%timeit
d * d
53.2 ms ± 333 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
%%timeit
a * a
318 ms ± 12.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Observations
pandas is about five to six times faster than numpy to evaluate this simple multiplication. How can this be?