Does the second way use float32 for division directly? (I hope it does not use float64 as an intermediate dtype)
Yes. You can check that by looking the code or more directly by scanning hardware events which clearly show that single floating point arithmetic instructions are executed (at least with Numpy 1.18).
In general, is there a way to do division in a specified dtype?
AFAIK, not directly with Numpy. Type promotion rules always apply. However, it is possible with Numba to perform conversions cell by cell which is much more efficient than using intermediate array (costly to allocate and to read/write).
Do I need to specify some casting argument?
This is not needed here since there is no loss of precision in this case. Indeed, in the first version the input operands are of type float32
as well as for the result. For the second version, the type promotion rule is automatically applied and a
is implicitly casted to float32
before the division (probably more efficiently than the first method as no intermediate array could be created). The casting
argument helps you to control the level of safety here (which is safe
by default): for example, you can turn it to no
to be sure that no cast occurs (for the both operands and the result, an error is raised if a cast is needed). You can see the documentation of can_cast for more information.
Same question about converting back from [-1.0, 1.0] float32 into int16
Similar answers applies. However, you should should care about the type promotion rules as float32 * int16 -> float32
. Thus, the result of a multiply
will have to be casted to int16
and a loss of accuracy appear. As a result, you can use the casting
argument to enable unsafe
casts (now deprecated) and maybe better performance.
Notes & Advises:
I advise you to use the Numba's @njit to perform the operation efficiently.
Note that modern processors are able to perform such operations very quickly if SIMD instructions are used. Consequently, the memory bandwidth and the cache allocation policy should be the two main limiting factors. Fast conversions can be archived by preallocating buffers, by avoiding the creation of new temporary arrays as well as by avoiding the copy of unnecessary (large) arrays.