Converting float32 to float64 takes more than expected in numpy

Question

I had a performance issue in a numpy project and then I realized that about 3 fourth of the execution time is wasted on a single line of code:

error = abs(detected_matrix[i, step] - original_matrix[j, new])

and when I have changed the line to

error = abs(original_matrix[j, new] - detected_matrix[i, step])

the problem has disappeared.

I relized that the type of original_matrix was float64 and type of detected_matrix was float32. By changing types of either of these two varibles the problem solved.

I was wondering that if this is a well known issue?

Here is a sample code that represents the problem

from timeit import timeit
import numpy as np

f64 = np.array([1.0], dtype='float64')[0]
f32 = np.array([1.0], dtype='float32')[0]

timeit_result = timeit(stmt="abs(f32 - f64)", number=1000000, globals=globals())
print(timeit_result)


timeit_result = timeit(stmt="abs(f64 - f32)", number=1000000, globals=globals())
print(timeit_result)

Output in my computer:

2.8707289
0.15719420000000017

which is quite strange.

This is very interesting. In fact, I found that if `f32` is explicitly converted to float64, it will be much faster: `timeit('abs(np.float64(f32) - f64)', ...)` I don't know what numpy or Python did in the background to slow it down. — Mechanic Pig, Aug 13 '22 at 12:51
Fortunately only reproducible with scalar values, not with arrays. My guess is that `f32 - f64` needs to allocate new memory while `f64 - f32` somehow doesn't. I can't find a matching github issue but [several](https://github.com/numpy/numpy/issues/10995) about [inconsistent](https://github.com/numpy/numpy/issues/1362) behavior for `np.float32` called with scalars or single element arrays. — Michael Szczesny, Aug 13 '22 at 14:20

score 2 · Accepted Answer · answered Aug 13 '22 at 17:22

TL;DR: Please use Numpy >= 1.23.0.

This problem has been fixed in Numpy 1.23.0 (more specifically the version 1.23.0-rc1). This pull request rewrites the scalar math logic so to make it faster in many cases including in your specific use-case. With version 1.22.4, the former code is 10 times slower than the second one. This is also true for earlier versions like the 1.21.5. In the 1.23.0, the former is only 10%-15% slower but both takes a very small time: 140 ns/operation versus 122 ns/operation. The small difference is due to a slightly different path taken in the type-checking part of the code. For more information about this low-level behavior, please read this post. Note that iterating over Numpy items it not meant to be very fast, nor operating on Numpy scalar. If your code is limited by that, please consider converting Numpy scalar into Python ones as stated in the 1.23.0 release notes:

Many operations on NumPy scalars are now significantly faster, although rare operations (e.g. with 0-D arrays rather than scalars) may be slower in some cases. However, even with these improvements users who want the best performance for their scalars, may want to convert a known NumPy scalar into a Python one using scalar.item().

An even faster solution is to use Numba/Cython in this case or just to try to vectorize the encompassing loop if possible.

Indeed for Numba: the last version supported is 1.22.4 yet because there are [some issues](https://github.com/numba/numba/issues/8263) with Numpy 1.23.0 in Numba 0.56.0. That being said, Numba developers are working on this and planned to fix this in version 0.56.1 which should be released in the "next few weeks" according to them ;) . — Jérôme Richard, Aug 13 '22 at 17:49

Converting float32 to float64 takes more than expected in numpy

1 Answers1