Why is Pandas Dataframe so slow when multiplying by a scalar?

Question

I've noticed that when you multiply a dataframe by a scalar, it's an order of magnitude slower than multiplying the numpy array instead. The slowness increases more than linearly with dataframe size. Python 3.6, Pandas 0.24.0.

Is there a better workaround?

Why is multiplying by a scalar so slow in Pandas? It seems like a performance bug to me.

df0 = pd.DataFrame(np.random.randn(1000, 400))

# Time to make a copy
%timeit df = df0.copy();
1.25 ms ± 5.64 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

# Multiplying by a scalar is absurdly slow
%timeit df = df0.copy(); df = df * 1
64.7 ms ± 265 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

# Using numpy is much faster
%timeit df = df0.copy(); df[:] = df.values * 1
3.54 ms ± 251 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

# Multiplying by another dataframe with the same indexes is much faster
%timeit df = df0.copy(); df = df * df0
1.68 ms ± 5.31 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

Working with the underlying numpy arrays will always benefit performance. Note that you're also modifying in-place in the second case, which also contributes to efficiency — yatu, Aug 26 '19 at 13:05

score 3 · Answer 1 · answered Aug 26 '19 at 13:07

Is there a better workaround?

You can working with numpy array and DataFrame constructor, if performance is important:

In [219]: %timeit df = df0.copy()
2.78 ms ± 37 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [220]: %timeit df = df0.copy(); df = df * 1
129 ms ± 3.74 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

In [221]: %timeit df = df0.copy(); df[:] = df.values * 1
9.35 ms ± 118 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [222]: %timeit df = df0.copy(); df = pd.DataFrame(df.values * 1,
                                                     index=df.index,
                                                     columns=df.columns)

4.88 ms ± 40.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

thomas · Answer 2 · 2020-06-15T16:00:30.647

Is it really that bad?

In the following I wanted to show you how you can make your code faster. But then I realized, that it depends also on the size of the dataset in use. Nonetheless, let us first have a look at your problem. I will run the same code on my machine in order to gauge the comparison. I will do everything for a big dataset (100 times yours) and a small one (your dataset).

Pandas is slow on some numerical computations. Let's see how slow compared to equivalent numpy operations.

Using pandas 0.23.4 on Linux 32 Cores, within a jupyter notebook (Using pandas 1.0.4 on Windows 2 Cores for the results at the end, within a jupyter notebook)

Note, that all results have been found within a jupyter notebook. I did not change any settings. It might be, that under real world conditions, the results would differ.

Measurements

In the following my measurements.

Big DataSet

import pandas as pd
import numpy as np

a = np.random.randn(10000, 4000)
df0 = pd.DataFrame(a.copy())
df = df0.copy()

Note, that I use a bit more data, 100 times more. Additionally, I use the magic %%time command for measurement instead of the %%timeit.

df.shape

(10000, 4000)

I run the following cell twice. The first time you run it, the kernel might still load libraries or compile something. It will show different results. But you can assume, that there is no internal state changed or results cached on the DataFrame when performing a simple multiplication (as it does when you perform a groupby and aggregation).

Furthermore, I do not create a copy in each cell as you did. Nonetheless, the following creates a new DataFrame and keeps the old one. It is not only a view on the left hand sides' dataframe.

%%time
_ = df * 1

CPU times: user 78 ms, sys: 90.6 ms, total: 169 ms
Wall time: 24.3 ms

If we assign the resulting DataFrame instance to the df pointer, the execution of the cell takes longer. Maybe, because the garbage collector frees the DataFrame from the left hand side: There is no reference left in the notebook to this one anymore. So be careful in your performance tests, what you are measuring!

%%time
df = df * 1

CPU times: user 84.4 ms, sys: 94.7 ms, total: 179 ms
Wall time: 31.7 ms

Or with inplace multiplication

%%time
df *= 1

CPU times: user 77.1 ms, sys: 97 ms, total: 174 ms
Wall time: 31 ms

Observations to the above: Note, that total time is higher than wall time (your wall clock or smartphone clock, nowadays). This tells us, that some multiprocessing or concurrent multithreading works in the background.

Let's continue now in how to make things faster. You tried the following, basically:

%%time
df[:] = df.values * 1.

CPU times: user 258 ms, sys: 234 ms, total: 492 ms
Wall time: 491 ms

This is not faster, because the __setitem__, which is quite sophisticated on pandas.Dataframes, is slow. The same you get for the loc.

%%time
df.loc[:] = df.values * 1.

CPU times: user 260 ms, sys: 224 ms, total: 485 ms
Wall time: 483 ms

Accessing the data directly

You can access the data directly, and set the values. This seems to be faster. (But you might have problems, if you have mixed datatypes in the DataFrame.)

%%time
df.values[...] = df.values * 1. 

CPU times: user 95.7 ms, sys: 78.5 ms, total: 174 ms
Wall time: 173 ms

Or even faster, do everything inplace. (As long as df.values[...] returns a reference to the data store.)

%%time
df.values[...] *= 1

CPU times: user 43.4 ms, sys: 0 ns, total: 43.4 ms
Wall time: 42.6 ms

Can it be faster than that? Let's compare this with the following multiplications. First by multiplying the initial dataset, the numpyarray a ...

%%time
_ = a * 1

CPU times: user 45.9 ms, sys: 82.7 ms, total: 129 ms
Wall time: 128 ms

... and by performing the corresponding inplace multiplication.

%%time
a *= 1

CPU times: user 43.5 ms, sys: 0 ns, total: 43.5 ms
Wall time: 42.9 ms

It shows, that less then about 43 milliseconds cannot be expected. Therefore, accessing the data directly and operating on it is as fast as operating on numpy arrays directly.

But note, in my example, even the initial quess is faster than that. Showing, that there is some optimization taking place with pandas, which does not with numpy. Strange!

Small Dataset

Here I make the same observations as you did. The trick with accessing the data directly, works out best again (df.values[...] *= 1).

import numpy as np
import pandas as pd

a = np.random.randn(1000, 400)
df0 = pd.DataFrame(a.copy())
df = df0.copy()


df.shape
(1000, 400)

%%time
_ = df * 1
CPU times: user 4.23 ms, sys: 1.28 ms, total: 5.51 ms
Wall time: 2.83 ms


%%time
df = df * 1
CPU times: user 4.68 ms, sys: 188 µs, total: 4.87 ms
Wall time: 2.22 ms


%%time
df *= 1
CPU times: user 2.66 ms, sys: 1.76 ms, total: 4.42 ms
Wall time: 1.71 ms

%%time
df[:] = df.values * 1.
CPU times: user 4.28 ms, sys: 21 µs, total: 4.3 ms
Wall time: 3.51 ms

%%time
df.loc[:] = df.values * 1.
CPU times: user 3.77 ms, sys: 0 ns, total: 3.77 ms
Wall time: 3.13 ms

%%time
df.values[...] = df.values * 1. 
CPU times: user 2.19 ms, sys: 0 ns, total: 2.19 ms
Wall time: 1.38 ms

%%time
df.values[...] *= 1
CPU times: user 211 µs, sys: 1.05 ms, total: 1.26 ms
Wall time: 681 µs

%%time
_ = a * 1
CPU times: user 1.61 ms, sys: 0 ns, total: 1.61 ms
Wall time: 818 µs


%%time
a *= 1
CPU times: user 379 µs, sys: 950 µs, total: 1.33 ms
Wall time: 671 µs

Open Questions

It looks that simple multiplications are sometimes faster with pandas as with numpy. Here for the big dataset from above.

%%timeit
_ = df * df
22.8 ms ± 590 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

%%timeit
_ = a * a
133 ms ± 4.85 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

It does not matter whether I call timeit or time. The results are the same.

%%time
_ = df * df
CPU times: user 62.3 ms, sys: 99.2 ms, total: 162 ms
Wall time: 23.8 ms

%%time
_ = a * a
CPU times: user 57.6 ms, sys: 82.3 ms, total: 140 ms
Wall time: 139 ms

I did not expect this. And you?

I cross-checked this on Windows 10, 2 Cores with pandas 1.0.4. The results look basically the same. Althought the relative differences are not that big anymore.

%%timeit
df * df
165 ms ± 5.96 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

%%timeit
a * a
251 ms ± 9.71 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Meanwhile I found the reason, why pandas is faster then numpy in my case compared to your case: pandas uses numexpr in the background, if and only if it is installed. — thomas, Jun 18 '20 at 07:30

score 0 · Answer 3 · answered Jun 18 '20 at 07:32

Make pandas faster then numpy on simple mathematical operations

You can make pandas perform better then numpy (alone) if you install the module numexpr (and don't switch off it's usage. It is switched on by default).

Look at this for more explanations:

Why is pandas faster then numpy on simple mathematical operations?

By doing so, your example above will for sure perform better. I tested pandas' behaviour for multiplications and other operators: left and right hand side multiplication with scalars, row vectors, column vectors and matrices.