1

I have came across this vector/scalar division implementation:

public static Vector2 operator /(Vector2 value1, float divider)
{
    float factor = 1 / divider;
    value1.X *= factor;
    value1.Y *= factor;
    return value1;
}

I tried implementing it via simple divison:

public static Vector2 operator /(Vector2 value1, float divider)
{
    return new Vector2(value1.X / divider, value1.Y / divider);
}

I tried running a simulation and it seems that results are slightly different.

Is this some sort of trick to boost calculation precision?

Den
  • 1,827
  • 3
  • 25
  • 46
  • 1
    What are the different results? – Liam McInroy Jun 26 '14 at 22:07
  • The result won't be correctly rounded, but often the speedup from saving a division matters more than the loss of precision in the result. (And the result will be worse if you use this trick, not better.) – tmyklebu Jun 26 '14 at 22:22
  • Could you post a specific example? I expect the simple division to give the most precise possible answer, and the multiplication version to have slightly greater rounding error, but looking at a specific case would allow that to be checked. – Patricia Shanahan Jun 27 '14 at 01:03

3 Answers3

10

No it's just an attempt to make it faster as multiplications are often faster than divides.

Should I use multiplication or division?

1 divide and 2 multiplies vs 2 divides may not always be faster but it's almost certainly the reason.

If the vector was 3 or more dimensions I'd be more convinced it's worth it but always profile for these micro optimisations.

The rounding errors on the extra operations are causing the different results.

Community
  • 1
  • 1
weston
  • 54,145
  • 21
  • 145
  • 203
  • 1
    Based on Agner Fog's instruction tables, at least for Haswell, two perfectly scheduled divs would take 14 clocks, while a div and two muls would take 8. – Cory Nelson Jun 26 '14 at 22:23
  • I'm dubious about putting exact numbers on that. Different processors have different values. So which table are you referring to? Plus which exact operations? – weston Jun 26 '14 at 22:40
  • 2
    @weston: obviously the table for Haswell (http://en.wikipedia.org/wiki/Haswell_%28microarchitecture%29). – Rudy Velthuis Jun 26 '14 at 22:57
  • See the Instruction Tables manual at [Agner Fog's site](http://www.agner.org/optimize/#manuals). I'm looking at `DIVSS` and `MULSS` for the Haswell architecture. – Cory Nelson Jun 26 '14 at 23:04
  • Interesting stuff. No idea how to interpret those tables myself! Why did you choose the SS operations over the other million muls and divs? – weston Jun 26 '14 at 23:29
  • 1
    `SS` = Scalar Single precision, i.e. `float`. As opposed to, e.g. `PD` = Packed (vector) Double precision. – Stephen Canon Jun 27 '14 at 00:41
4

Remember that a floating point has the form (significand * base^exponent), where base is most often 2, and significand has a limited number of digits in the base, and is in interval [1,2) or [0,1) for subnormals.

According to IEEE 754 standard, each floating point operation (at least + - * / fmod remainder sqrt) works as if produced by this algorithm:

  1. perform the exact operation (significand1 * 2^exp1) op (significand2 * 2^exp2)
  2. round this exact result to nearest representable float according to rounding direction currently in effect.

I simplified a bit, because there are also handling of exceptional values, programmable generation of FPU exceptions, etc...

So, the more operations, the mannier the rounding approximations.

Since * is faster than / (try to emulate those by yourself), the first one which is factoring out the inverse is there to boost speed of execution, but certainly not to boost precision.

aka.nice
  • 9,100
  • 1
  • 28
  • 40
2

IMHO, the precision should be worse rather than better, as you have two operations and thus two chances for the floating point approximations to kick in...

If the factor computation was more complex, it could be a way to save computing it twice, but that's not even the case here.

jcaron
  • 17,302
  • 6
  • 32
  • 46