Also SIMD implementations tend to be at least 128bit wide, so I wonder does that mean the (internal) precision of operations is higher than for the x87 FP unit?
The width of a SIMD register is not the width of one individual component of the vector it represents. Widely available SIMD instruction sets offer at most the IEEE 754 binary64 format (64-bit wide). This is not nearly as good as the historical 80-bit extended format for precision or range.
Many C compilers make the 80-bit format available as the long double
type. I use it often. It is good to use for most intermediate computations: using it contributes to make the end result more accurate even if the end result is destined to be returned as a binary64 double
. One example is the function in this question, for which a mathematically intuitive property holds of the final result if intermediate computations are done with long double
, but not if intermediate computations are done with the same double
type as the inputs and output.
Similarly, among many constraints that had to be balanced in the choice of the parameters for the extended 80-bit format, one consideration is that it is perfect to compute a binary64 function pow()
by composing 80-bit expl()
and logl()
. The extra precision is necessary in order to obtain a good accuracy for the end-result.
I should note, however, that when the “intermediate” computations are a single basic operation, it is better not to go through extended precision. In other words, when x
and y
are of type double
, the accuracy of (double)(x * (long double)y)
is very slightly worse than the accuracy of x * y
. The two expressions almost always produce the same results, and in the rare cases where they differ, x * y
is very slightly more accurate. This phenomenon is called double-rounding.