101

In C/C++, you can set up the following code:

double a, b, c;
...
c = (a + b) / 2;

This does the exact same thing as:

c = (a + b) * 0.5;

I'm wondering which is better to use. Is one operation fundamentally faster than the other?

Philip Couling
  • 13,581
  • 5
  • 53
  • 85
wodesuck
  • 1,061
  • 2
  • 8
  • 6
  • 14
    In general they don't do the same thing. – R. Martinho Fernandes Jul 26 '13 at 13:56
  • In this case (assuming IEEE binary floats), I think they wind up doing the same thing, since they can both be exactly represented in binary, so there's no loss of precision. – Drew McGowen Jul 26 '13 at 13:57
  • This is highly CPU/architecture dependent. If you're developing only for one architecture, the easiest way is to profile it yourself. – Jan Spurny Jul 26 '13 at 13:57
  • Any decent compiler will optimize these to the same cost (usually `/ 2` becomes `* 0.5`, and then sometimes becomes optimized "divide float by two" microcode). Note in this case (`2` and `0.5` in IEEE format) they are the same, but that may not be true for all constants. – Thomas Jul 26 '13 at 13:58
  • 4
    @GrijeshChauhan Tell you quickly? What's _that_ about? – Grant Thomas Jul 26 '13 at 13:58
  • 1
    You could be interested in this complete answer http://stackoverflow.com/questions/226465/should-i-use-multiplication-or-division – HAL9000 Jul 26 '13 at 13:59
  • 1
    @GrijeshChauhan so calculate `1/7` and assign it to a constant and then multiply by that constant inside the loop that `1/7` would otherwise have been used inside. Or just punch it into a calculator with a sufficient precision. – Philip Couling Jul 26 '13 at 14:01
  • Because an answer (now deleted) mentioned binary shift for the example in the question, I want to point out that the `>>` binary shift operator does not exist for floating-point operands, but this does not mean that there isn't an efficient binary shift. Moving the binary point of a floating point number is accomplished by incrementing or decrementing the exponent field, not by a bitshift. – Ben Voigt Jul 26 '13 at 14:02
  • @couling Cool man! I can delete my comment No problem :) – Grijesh Chauhan Jul 26 '13 at 14:02
  • 6
    @JanSpurny: This is not highly CPU or architecture dependent. It is partly CPU or architecture dependent. Multiplication is much faster than division on many processors, and programmers ought to favor multiplication over division in the absence of reason otherwise. – Eric Postpischil Jul 26 '13 at 14:12
  • 37
    The votes down and votes to close are inappropriate. This is a good question with significant performance consequences. High-performance code often favors multiplication over division because multiplication is faster on most modern processors. – Eric Postpischil Jul 26 '13 at 14:13
  • 12
    Another reason this is an important question (so vote up) is that **the optimizer cannot make this transformation**, except when the factor is a power-of-two (or, if non-binary floating-point is in use, is a number with an inverse in the floating-poing system). When the factor does not have an exact inverse in the floating-point system, the optimizer is not permitted to change division to multiplication. Thus, programmers should be aware multiplication is faster than division and should favor multiplication if they know the rounding error in the inverse is acceptable. – Eric Postpischil Jul 26 '13 at 14:42
  • 2
    Be very careful. There's more to numerical calculations than just "as fast as possible". There's correctness, readability, round-off error, truncation error, numerical stability, and many, other things to consider when doing numerical calculations on a machine. – jason Jul 26 '13 at 15:01
  • 3
    @Eric: The votes to close do not indicate that this is a bad question, they indicate that it's been asked and answered already. They are completely appropriate. Downvotes I don't know about, but I suppose one could argue that it shows that wodesuck didn't put in research effort because he would have found the existing answers easily. – Ben Voigt Jul 26 '13 at 15:43
  • 1
    @EricPostpischil: Stating that others' opinions are inappropriate is inappropriate. Besides, you're wrong. – Lightness Races in Orbit Jul 26 '13 at 15:47
  • @EricPostpischil GCC has an `-ffast-math` option for this. – Nikos C. Jul 26 '13 at 16:18
  • @Eric: Yes, I saw a number of votes to close as duplicate. – Ben Voigt Jul 26 '13 at 16:31
  • @Eric: Almost correct. StackOverflow doesn't do that if the comments already contain a link to the question... and HAL9000 has already commented with a link. There *were* votes to close as duplicate, I left one, and there was one before I cast mine. – Ben Voigt Jul 26 '13 at 16:35
  • @BenVoigt Downvotes could be from "Does not show any original research"; the OP is asking about performance, but didn't even try testing it themselves. – Izkata Jul 26 '13 at 17:01

3 Answers3

73

Multiplication is faster than division. At university I was taught that division takes six times that of multiplication. The actual timings are architecture dependent but in general multiplication will never be slower or even as slow as division. Always optimize your code towards using multiplication if the rounding errors allow.

So in an example this would typically be slower ...

for (int i=0; i<arraySize; i++) {
    a[i] = b[i] / x;
}

... than this ...

y=1/x;
for (int i=0; i<arraySize; i++) {
    a[i] = b[i] * y;
}

Of course with rounding errors, you'll loose (a little) precision with the second method, but unless you are repeatedly calculating x=1/x; that's unlikely to cause much issue.

Edit:

Just for reference. I've dug up a third party comparison of operation timings by searching on Google.

http://gmplib.org/~tege/x86-timing.pdf

Look at the numbers on MUL and DIV. This indicates differences of between 5 and 10 times depending on the processor.

Philip Couling
  • 13,581
  • 5
  • 53
  • 85
  • 37
    "*Can't understand why this has been down voted so much*" because on SO any question that even comes remotely close to undefined/unspecified/implementation dependent behaviour is likely immediately downvoted, closed, and swept under the carpet. Apparently pedantry is more relevant than realistic and pragmatic development needs. Slightly tongue-in-cheek but it sometimes feels that way. – Thomas Jul 26 '13 at 14:14
  • 5
    @Thomas yup. What's galling is that this isn't even ambiguous or undefined. Processor manufacturers usually publish the stats. This is where the stats come from for compilers to perform the optimizations. So here "undefined" means "I haven't read it". aaarrrrrrr. Okay. – Philip Couling Jul 26 '13 at 14:27
  • 8
    @Thomas: I would guess that the downvotes come from the fact that the question is pretty trivial and has an exact duplicate that's automatically shown in the "related" column to the right (read as: no research effort), the optimization has a high likelihood of being premature (think out of order execution and memory bandwidth), and the code snippet contains implicit int-double conversions and extra operations that are unrelated to the question. (That's only a guess, of course, I'm not one of the downvoters, so can't tell) – Damon Jul 26 '13 at 15:22
  • @Damon is correct. Guessing at downvoters' motivations may be a fun game but, in this case at least, Thomas lost. – Lightness Races in Orbit Jul 26 '13 at 15:47
  • 1
    @Thomas: I tend to go easy on newbies. We all make or at least made mistakes at one point or another, and getting scorched for it doesn't make anyone feel welcome. Helping less experienced people is a good thing, and StackOverflow is very good at that. – Mike Dunlavey Jul 26 '13 at 17:14
  • 3
    For x86, see http://agner.org/optimize/. div and mul have different ratios of throughput to latency. Both are only 1 uop, so can overlap well with other operations. Your ~6x more expensive looks about right for throughput, but the latency ratio these days for FP div is more like 4 (on Intel Haswell). Intel Skylake has a very heavily pipelined FP div unit (throughput of one `double` result per 4 cycles). (256b SIMD vectors have lower throughput). – Peter Cordes Mar 04 '16 at 10:32
35

It is quite likely that the compiler will convert a divide to a multiply in this case, if it "thinks" it's faster. Dividing by 2 in floating point may also be faster than other float divides. If the compiler doesn't convert it, it MAY be faster to use multiply, but not certain - depends on the processor itself.

The gain from manually using multiply instead of divide can be quite large in cases where the compiler can't determine that it's "safe" to do so (e.g. 0.1 can't be stored exactly as 0.1 in a floating point number, it becomes 0.10000000149011612). See below for figures on AMD processors which can be taken as representative for the class.

To tell if your compiler does this well or not, why don't you write a bit of code to experiment. Make sure you write it so that the compiler doesn't just calculate a constant value and discards all the calculation in the loop tho'.

Edit:

AMD's optimisation guide for Family 15h processors, provide figures for fdiv and fmul are 42 and 6 respectively. SSE versions are a little closer, 24 (single) or 27 (double) cycles for DIVPS, DIVPD DIVSS and DIVSD (divide), and 6 cycles for all forms of multiply.

From memory, Intel's figures aren't that far off.

Elis Byberi
  • 1,422
  • 1
  • 11
  • 20
Mats Petersson
  • 126,704
  • 14
  • 140
  • 227
  • 2
    What the h*** did I get wrong now, that I get downvoted for providing more information? Please tell me so I can learn! – Mats Petersson Jul 26 '13 at 14:50
  • I didn't downvote, have no idea why someone else did, but I hope you don't mind that I corrected your statement about `0.1`. – jason Jul 26 '13 at 14:56
  • 2
    It's not quite clear what you mean in the second paragraph (the parenthesis is never closed, and the sentence kind of trails off. But as has been pointed out in the comments to other answers, the compiler typically won't convert divisions to multiplies when dealing with floating-point code. – jalf Jul 26 '13 at 15:03
  • @Jalf: yes, I have clarified the second paragraph a little bit. – Mats Petersson Jul 26 '13 at 15:30
29

Floating point multiplication usually takes fewer cycles than floating point division. But with literal operands the optimizer is well aware of this kind of micro-optimizations.

David Heffernan
  • 601,492
  • 42
  • 1,072
  • 1,490
ouah
  • 142,963
  • 15
  • 272
  • 331
  • 27
    This is a situation where we cannot rely on the optimizer, if we ask about cases other than single example in the question. Division by a power of two and multiplication by the inverse are equivalent in binary floating-point, but divisions by other factors have no equivalent multiplication, because the reciprocal cannot be exactly represented. This prohibits the optimizer from making the transformation. Thus, programmers should be aware of this and should favor multiplication if they know the rounding error in the inverse is acceptable. – Eric Postpischil Jul 26 '13 at 14:16
  • @EricPostpischil: Does it "help" to give flags for "use fast math" in cases where we don't care too much about precision? Say, forexample in case of 3d games, you'd expect that the last couple of bits of floating point precision doesn't matter. – Mats Petersson Jul 26 '13 at 14:24
  • 21
    `x/y —> x * (1/y)` is licensed by `-ffast-math`. Specifically, this is licensed by `-freciprocal-math`, which is one of the “optimizations” included in `-ffast-math`. – Stephen Canon Jul 26 '13 at 14:59