Why is `numpy` slower than python for left bit shifts?

Question

I am trying to do bit shifts on numpy integers (specifically, numpy.uint64 objects) and I need them to be fast. In my implementation below, I put the object in a numpy.array only because that's the only object that can accept bit left shifts. If there is any faster implementation I will accept it.

from timeit import timeit
print(timeit("a << 1", "a = int(2**60)"))
print(timeit("a << 1", "import numpy as np; a = np.array([2 ** 60], dtype=np.uint64)"))
print(timeit("np.left_shift(a, 1)", "import numpy as np; a = np.array([2 ** 60], dtype=np.uint64)"))

returns:

0.056681648000000084
1.208092987
1.1685176299999998

Why is python so much faster than numpy for this operation? Is there a way to get comparable speeds in numpy?

you're applying a vectorized shift on one element. There's a big overhead just reaching the shift part and changing the numpy struct. native code shifts just faster. But if you do 10000 shifts that'll change — Jean-François Fabre, Oct 12 '18 at 15:19
How would you suggest speeding it up? `a << 1` doesn't work on a numpy uint64 unless the object is in an array — Paul Terwilliger, Oct 12 '18 at 15:21
sounds that there's a limitation: https://github.com/numpy/numpy/issues/2524 — Jean-François Fabre, Oct 12 '18 at 15:24
If Numba is an option for you, you can also try something like that. https://stackoverflow.com/a/45070947/4045774 — max9111, Oct 13 '18 at 18:08

Jean-François Fabre · Accepted Answer · 2018-10-12T15:33:07.267

5

About the performance difference, it seems logical: you're applying a vectorized shift on one element. There's a big overhead just reaching the shift part and changing the numpy struct. native code shifts just faster.

Okay, I googled the error message that you get when you try to do that on one element, which is:

>>> a = numpy.uint64(2**60)
>>> a << 3
Traceback (most recent call last):
  File "<string>", line 301, in runcode
  File "<interactive input>", line 1, in <module>
TypeError: ufunc 'left_shift' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''

and I found this github issue: https://github.com/numpy/numpy/issues/2524

This is because the shift number is converted as a signed type and there is no signed integer type big enough to hold a uint64.

now a good workaround (as seen in this github issue comment) is this:

a << numpy.uint64(1)

(maybe build the "1" constant once and for all and use it in all your code to save the object creation)

edited Oct 12 '18 at 15:33

answered Oct 12 '18 at 15:27

Jean-François Fabre

137,073
23
153
219

Update your answer with this comment and I'll give you the green check mark: https://github.com/numpy/numpy/issues/2524#issuecomment-348538957 This answer is comparable in speed - I was in the process of writing up this answer. – Paul Terwilliger Oct 12 '18 at 15:28
The scalar/array shift case works because scalars get "demoted" to a smaller type if possible in mixed scalar/array operations, with some special handling to pick an unsigned type if necessary. See [numpy.result_type](https://docs.scipy.org/doc/numpy/reference/generated/numpy.result_type.html#numpy.result_type). – user2357112 Oct 12 '18 at 15:34
There's certainly risk of confusion, and I'd prefer to make the types explicit rather than rely on it. – user2357112 Oct 12 '18 at 15:37

Why is `numpy` slower than python for left bit shifts?

1 Answers1

Linked