I have just noticed that the execution time of a script of mine nearly halves by only changing a multiplication to a division.
To investigate this, I have written a small example:
import numpy as np
import timeit
# uint8 array
arr1 = np.random.randint(0, high=256, size=(100, 100), dtype=np.uint8)
# float32 array
arr2 = np.random.rand(100, 100).astype(np.float32)
arr2 *= 255.0
def arrmult(a):
"""
mult, read-write iterator
"""
b = a.copy()
for item in np.nditer(b, op_flags=["readwrite"]):
item[...] = (item + 5) * 0.5
def arrmult2(a):
"""
mult, index iterator
"""
b = a.copy()
for i, j in np.ndindex(b.shape):
b[i, j] = (b[i, j] + 5) * 0.5
def arrmult3(a):
"""
mult, vectorized
"""
b = a.copy()
b = (b + 5) * 0.5
def arrdiv(a):
"""
div, read-write iterator
"""
b = a.copy()
for item in np.nditer(b, op_flags=["readwrite"]):
item[...] = (item + 5) / 2
def arrdiv2(a):
"""
div, index iterator
"""
b = a.copy()
for i, j in np.ndindex(b.shape):
b[i, j] = (b[i, j] + 5) / 2
def arrdiv3(a):
"""
div, vectorized
"""
b = a.copy()
b = (b + 5) / 2
def print_time(name, t):
print("{: <10}: {: >6.4f}s".format(name, t))
timeit_iterations = 100
print("uint8 arrays")
print_time("arrmult", timeit.timeit("arrmult(arr1)", "from __main__ import arrmult, arr1", number=timeit_iterations))
print_time("arrmult2", timeit.timeit("arrmult2(arr1)", "from __main__ import arrmult2, arr1", number=timeit_iterations))
print_time("arrmult3", timeit.timeit("arrmult3(arr1)", "from __main__ import arrmult3, arr1", number=timeit_iterations))
print_time("arrdiv", timeit.timeit("arrdiv(arr1)", "from __main__ import arrdiv, arr1", number=timeit_iterations))
print_time("arrdiv2", timeit.timeit("arrdiv2(arr1)", "from __main__ import arrdiv2, arr1", number=timeit_iterations))
print_time("arrdiv3", timeit.timeit("arrdiv3(arr1)", "from __main__ import arrdiv3, arr1", number=timeit_iterations))
print("\nfloat32 arrays")
print_time("arrmult", timeit.timeit("arrmult(arr2)", "from __main__ import arrmult, arr2", number=timeit_iterations))
print_time("arrmult2", timeit.timeit("arrmult2(arr2)", "from __main__ import arrmult2, arr2", number=timeit_iterations))
print_time("arrmult3", timeit.timeit("arrmult3(arr2)", "from __main__ import arrmult3, arr2", number=timeit_iterations))
print_time("arrdiv", timeit.timeit("arrdiv(arr2)", "from __main__ import arrdiv, arr2", number=timeit_iterations))
print_time("arrdiv2", timeit.timeit("arrdiv2(arr2)", "from __main__ import arrdiv2, arr2", number=timeit_iterations))
print_time("arrdiv3", timeit.timeit("arrdiv3(arr2)", "from __main__ import arrdiv3, arr2", number=timeit_iterations))
This prints the following timings:
uint8 arrays
arrmult : 2.2004s
arrmult2 : 3.0589s
arrmult3 : 0.0014s
arrdiv : 1.1540s
arrdiv2 : 2.0780s
arrdiv3 : 0.0027s
float32 arrays
arrmult : 1.2708s
arrmult2 : 2.4120s
arrmult3 : 0.0009s
arrdiv : 1.5771s
arrdiv2 : 2.3843s
arrdiv3 : 0.0009s
I always thought a multiplication is computationally cheaper than a division. However, for uint8
a division seems to be nearly twice as effective. Does this somehow relate to the fact, that * 0.5
has to calculate the multiplication in a float and then casting the result back to to an integer?
At least for floats multiplications seem to be faster than divisions. Is this generally true?
Why is a multiplication in uint8
more expansive than in float32
? I thought an 8-bit unsigned integer should be much faster to calculate than 32-bit floats?!
Can someone "demystify" this?
EDIT: to have more data, I've included vectorized functions (like suggested) and added index iterators as well. The vectorized functions are much faster, thus not really comparable. However, if timeit_iterations
is set much higher for the vectorized functions, it turns out that multiplication is faster for both, uint8
and float32
. I guess this confuses even more?!
Maybe multiplication is in fact always faster than division, but the main performance leaks in the for-loops is not the arithmetical operation, but the loop itself. Although this does not explain why the loops behave differently for different operations.
EDIT2: Like @jotasi already stated, we are looking for a full explanation of division
vs. multiplication
and int
(or uint8
) vs. float
(or float32
). Additionally, explaining the different trends of the vectorized approaches and the iterators would be interesting, as in the vectorized case, the division seems to be slower, whereas it is faster in the iterator case.