56

is there any advantage to using this code

double x;
double square = pow(x,2);

instead of this?

double x;
double square = x*x;

I prefer x*x and looking at my implementation (Microsoft) I find no advantages in pow because x*x is simpler than pow for the particular square case.

Is there any particular case where pow is superior?

Lightness Races in Orbit
  • 378,754
  • 76
  • 643
  • 1,055
Alessandro Jacopson
  • 18,047
  • 15
  • 98
  • 153
  • 2
    As I state in my answer this question take a slightly different twist in C++11 if you want to use this in a constexpr. – Shafik Yaghmour Aug 06 '14 at 21:48
  • Possible duplicate of [What is more efficient? Using pow to square or just multiply it with itself?](https://stackoverflow.com/questions/2940367/what-is-more-efficient-using-pow-to-square-or-just-multiply-it-with-itself) – underscore_d Oct 01 '17 at 18:04

8 Answers8

62

FWIW, with gcc-4.2 on MacOS X 10.6 and -O3 compiler flags,

x = x * x;

and

y = pow(y, 2);

result in the same assembly code:

#include <cmath>

void test(double& x, double& y) {
        x = x * x;
        y = pow(y, 2);
}

Assembles to:

    pushq   %rbp
    movq    %rsp, %rbp
    movsd   (%rdi), %xmm0
    mulsd   %xmm0, %xmm0
    movsd   %xmm0, (%rdi)
    movsd   (%rsi), %xmm0
    mulsd   %xmm0, %xmm0
    movsd   %xmm0, (%rsi)
    leave
    ret

So as long as you're using a decent compiler, write whichever makes more sense to your application, but consider that pow(x, 2) can never be more optimal than the plain multiplication.

Alnitak
  • 334,560
  • 70
  • 407
  • 495
  • 39
    `-O6` is the same as `-O3` by the way. g++ does not go up to eleven. – Tobu Jun 12 '11 at 09:41
  • 8
    Apparently it doesn't even go up to four :) – Karl Knechtel Jun 12 '11 at 09:53
  • 2
    @Alnitak +1 I like very much your approach of looking at the assembly code. – Alessandro Jacopson Jun 12 '11 at 10:30
  • 2
    I've always been wary of looking at assembly to prove program behaviour. A toolchain (and, most importantly, the standard) gives no guarantees about the assembly output, so it could change at any time. It's hardly a foolproof way to prove anything about a program, other than the specific two executables you happen to be looking at right now. – Lightness Races in Orbit Jun 12 '11 at 15:12
  • 1
    @Tomalak which is exactly why the first line of my answer says what it does – Alnitak Jun 12 '11 at 15:46
  • @Alnitak: Sure, you declared it. That doesn't change the fact that I believe it reduces the usefulness of the answer -- and, more to the point, this technique in general (mostly directed at uvts_cvs). – Lightness Races in Orbit Jun 12 '11 at 15:48
  • @Karl Knechtel: clang++ does go up to four, though not beyond that (yet)! – John Zwinck Jun 12 '11 at 16:05
  • Alnitak - I fail to see the point of your "don't do this:" warning. You checked the behavior of the compiler for the case of an integer exponent with a value known at compile time; obviously, the generated code will look very different for a general floating-point exponent. – ackb Jun 13 '11 at 05:50
  • @Tomalak Geret'kal, thank you for your warning, anyway I am aware of the limits of the approach. – Alessandro Jacopson Jun 13 '11 at 07:07
  • @LightnessRacesinOrbit: I would say this answer addresses the question precisely. The OP wants to know which solution is better _in practice_: looking at the assembler code is one way to find this out. – TonyK Dec 15 '14 at 09:30
  • 1
    @TonyK: How does it address the question "precisely"? This assembly code is from a different compiler on a different platform. If you want to focus on practicality then fine, let's do that: how does assembly from GCC on OSX tell the OP anything whatsoever about the code his Microsoft compiler will produce? It doesn't! We should either answer in broad strokes _or_ provide specific practical details _that match the question_. As it stands, this answer is a huge misdirect... Now, it's not the author's fault as his "FWIW" suggests he knows this; it's the fault of all those upvoters and the tick :( – Lightness Races in Orbit Dec 15 '14 at 22:07
  • @LightnessRacesinOrbit OTOH, if your compiler can't make the same optimisations as GCC on OSX, get a better compiler ;-) – Alnitak Dec 15 '14 at 22:56
  • I finally downvoted this, five years after the fact. There are a number of places in C++ where compiling with a very high optimization level is suboptimal. There are a number of organizations (e.g. NASA) that forbid enabling any optimizations, period, because in the long forgotten past, some poorly formed optimizing compiler crashed spacecraft. Something that makes a spacecraft crash will be remember for a long, long time, and NASA has *long* memories, even if it was a compiler bug from the previous millennium. – David Hammen Jun 14 '16 at 16:23
  • @DavidHammen FWIW, even `-O1` with gcc is sufficient to optimise `pow(x, 2)` – Alnitak Jun 14 '16 at 16:24
  • @Alnitak -- That depends on the compiler. You will not see gcc or clang used to build flight software. – David Hammen Jun 14 '16 at 16:25
  • @DavidHammen I'd hope not to even see C or C++ used for those at all (although the JPL coding guidelines for C to make interesting reading). – Alnitak Jun 14 '16 at 16:27
  • 8
    For what it's worth, **Microsoft's compiler absolutely will not generate the same code** for `pow(x, 2)` and `x * x`. It always calls the `pow` function, resulting in a series of branches and a lot of slow code. If you mean `x * x`, then write it as `x * x`. Avoid the use of overly generic functions just to be clever. We're talking about a *massive* speed difference here. Your answer is thus quite misleading. If you're going to analyze disassembly, you at least need to disassemble the code generated by the relevant compiler. Especially since the asker does actually bother to mention it. – Cody Gray - on strike Nov 01 '16 at 08:59
  • 1
    @CodyGray you appear to be in violent agreement with me. Read the last paragraph of my answer again. This also appears to confirm that Microsoft's compiler isn't a "decent" compiler. – Alnitak Nov 01 '16 at 09:46
26

std::pow is more expressive if you mean , x*x is more expressive if you mean x*x, especially if you are just coding down e.g. a scientific paper and readers should be able to understand your implementation vs. the paper. The difference is subtle maybe for x*x/, but I think if you use named functions in general, it increases code expessiveness and readability.

On modern compilers, like e.g. g++ 4.x, std::pow(x,2) will be inlined, if it is not even a compiler-builtin, and strength-reduced to x*x. If not by default and you don't care about IEEE floating type conformance, check your compiler's manual for a fast math switch (g++ == -ffast-math).


Sidenote: It has been mentioned that including math.h increases program size. My answer was:

In C++, you #include <cmath>, not math.h. Also, if your compiler is not stone-old, it will increase your programs size only by what you are using (in the general case), and if your implementation of std::pow just inlines to corresponding x87 instructions, and a modern g++ will strength-reduce with x*x, then there is no relevant size-increase. Also, program size should never, ever dictate how expressive you make your code is.

A further advantage of cmath over math.h is that with cmath, you get a std::pow overload for each floating point type, whereas with math.h you get pow, powf, etc. in the global namespace, so cmath increases adaptability of code, especially when writing templates.

As a general rule: Prefer expressive and clear code over dubiously grounded performance and binary size reasoned code.

See also Knuth:

"We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil"

and Jackson:

The First Rule of Program Optimization: Don't do it. The Second Rule of Program Optimization (for experts only!): Don't do it yet.

Sebastian Mach
  • 38,570
  • 8
  • 95
  • 130
  • 1
    +1, It is indeed important to adhere to `Principle of Least Astonishment` while writing any source code. – Alok Save Jun 12 '11 at 10:17
  • 2
    +1 for all sorts of things, especially that you bothered with the proper superscript-2 character :) – Lightness Races in Orbit Jun 12 '11 at 15:14
  • @Tomalak Geret'kal: On many (most?) Linux Desktop Distros, one can simply x¹²³⁴⁵⁶⁷⁸⁹⁰, which of course is quite awesome when you need it :) On the other hand, when I am on windows, I am not shy of using and – Sebastian Mach Jun 13 '11 at 17:38
  • I disagree with Jackson on that. Optimization (like using a better algorithm) is something that should be done all the time. – Cœur Mar 31 '14 at 13:53
  • @Cœur: The problem is that this leads to write only code of the scariest sort. The sort of which cripples companies because developers are not willing to sacrifice all their brain potential even on the most unimportant thing. I once worked with a company that has lost all of their developers. No developers, no product, no money, no company. How far does "always optimize" go? Because, if someone mandated that, I would dive down to bit-trickery, cache black magick and SSE assembly. I would write huge macros and turing complete templates, duplicate code for every special case, ... no, I'd quit. – Sebastian Mach Jun 17 '16 at 08:56
  • Stay on your level of language. Using a better algorithm can be achieved without assembly or macro. And I believe a long non-optimised code is harder to read than a short optimised one. – Cœur Jun 17 '16 at 13:49
  • @Cœur: I wonder if you have ever made use of SSE intrinsics. Those _are_ on the C++ language level, yet they are not nice to read, nor easy to write, nor easy to maintain. I bet if you give me the source code to one of your programs, I can easily optimize it within language bounds, with microoptimizations of all kinds, fast inverse square root, loop inversions, code reordering (wrt caching) etc etc., and you will never be able to maintain it anymore. – Sebastian Mach Jun 23 '16 at 09:31
  • @Cœur: ... you won't keep deadlines, everything becomes so expensive that your competitor has an easy time beating you on the market, experienced developers won't want to work with your code (and go to the competition), and so on. I am not sure if you really grasp the magnitutde of the effect of optimizing everything, even within language bounds. Hey, let's replace all multiplications with shifts, additions with bitwise operations, threading operations with raw pthreads (have fun maintaining RAII), etc etc etc. No business owner wants to pay for that. – Sebastian Mach Jun 23 '16 at 09:36
  • If you read the rest of Knuth you might come to understand that the quote you mention is not an excuse to skip optimization but rather to do it at the right time. – hsmyers May 21 '17 at 15:38
  • @hsmyers: If you read my post and comments completely instead of spreading unfounded claims, you might come to understand that I did not recommend to never optimize. Even the quotes I included explicitly state "about 97% of the time" and "Don't do it yet". Further, "Prefer expressive and clear code" is explicitly not the same as "Always write expressive and clear code". The only thing I come to wonder: Why is it that so many people don't read to the end before attempting to attack authors? – Sebastian Mach May 25 '17 at 07:23
  • @SebastianMach I should have been clear that I wasn't particularly speaking of you but more of those who toss out that single line on their way to the next big project. I started coding with IBM 360BAL when it wasn't a matter of 'premature' optimization. And actually I did read what you had written, but I missed the interpretation of the 2nd rule—sorry! Oh and believe me when I say this was not an attack, should I ever you will well remember the difference… – hsmyers May 25 '17 at 14:59
  • @hsmyers: I see. I take my hat off and sincerely apologize for my misinterpretation. – Sebastian Mach May 28 '17 at 17:56
14

Not only is x*x clearer it certainly will be at least as fast as pow(x,2).

David Heffernan
  • 601,492
  • 42
  • 1,072
  • 1,490
  • 1
    It will be certainly faster with virtually every compiler, often by more than an order of magnitude. – Sven Marnach Jun 12 '11 at 09:21
  • 1
    @sven I think that too but I guess a compiler could optimised and inline the pow to a simple multiply. So I was being cautious! – David Heffernan Jun 12 '11 at 09:23
  • 2
    @David indeed compilers _can_ optimise `pow(x, 2)` - see my answer. – Alnitak Jun 12 '11 at 09:33
  • 2
    @Sven: How do you define "virtually every compiler"? If you don't care about IEEE conformance, you can use g++ -fast-math, so it will strength reduce pow(x,2) to x*x, so there is no difference. – Sebastian Mach Jun 12 '11 at 09:33
  • 8
    We once even had to replace `pow(x, -2.5)` by `1.0 / (x*x*sqrt(x))` for performace reasons. Gave an incredible speed-up on the particular machine we did the computations on. (compiler was gcc 4.4, IIRC) – Sven Marnach Jun 12 '11 at 09:34
13

This question touches on one of the key weaknesses of most implementations of C and C++ regarding scientific programming. After having switched from Fortran to C about twenty years, and later to C++, this remains one of those sore spots that occasionally makes me wonder whether that switch was a good thing to do.

The problem in a nutshell:

  • The easiest way to implement pow is Type pow(Type x; Type y) {return exp(y*log(x));}
  • Most C and C++ compilers take the easy way out.
  • Some might 'do the right thing', but only at high optimization levels.
  • Compared to x*x, the easy way out with pow(x,2) is extremely expensive computationally and loses precision.

Compare to languages aimed at scientific programming:

  • You don't write pow(x,y). These languages have a built-in exponentiation operator. That C and C++ have steadfastly refused to implement an exponentiation operator makes the blood of many scientific programmers programmers boil. To some diehard Fortran programmers, this alone is reason to never switch to C.
  • Fortran (and other languages) are required to 'do the right thing' for all small integer powers, where small is any integer between -12 and 12. (The compiler is non-compliant if it can't 'do the right thing'.) Moreover, they are required to do so with optimization off.
  • Many Fortran compilers also know how to extract some rational roots without resorting to the easy way out.

There is an issue with relying on high optimization levels to 'do the right thing'. I have worked for multiple organizations that have banned use of optimization in safety critical software. Memories can be very long (multiple decades long) after losing 10 million dollars here, 100 million there, all due to bugs in some optimizing compiler.

IMHO, one should never use pow(x,2) in C or C++. I'm not alone in this opinion. Programmers who do use pow(x,2) typically get reamed big time during code reviews.

David Hammen
  • 32,454
  • 9
  • 60
  • 108
  • 4
    The absence of a dedicated pow operator (which isn't that hard to take) is more than compensated by operators for more complicated types (as vectors and matrices). And staying away from optimization is quite counter-productive (and should surely not be done in a scientific application). If your code doesn't work when optimized, it's just the wrong code. There are certain things you just have to rely on (like a working compiler). But I agree in not relying on that high-level optimization features not all compilers support (like the mentioned pow optimization, I suppose). – Christian Rau Jun 12 '11 at 14:51
  • @christian I'm greedy. I want it all, operator overloading and a pow operator! – David Heffernan Jun 12 '11 at 14:58
  • 1
    @Christian: You just touched on another sore spot. C++ has terrible support for vectors and matrices as used in scientific programming. As a starter, std::vector is not a vector. And yes, too some diehard Fortran programmers (or more importantly, diehard project managers who ban the use of C/C++), the absence of a dedicated exponentiation operator is too hard to take. – David Hammen Jun 12 '11 at 15:16
  • 1
    @David C++ supports vectors and matrices through writing your own classes (or using existing libraries) and this with perfectly useable operators on them (and with expression templates at nearly no extra cost). But yes, nobody wants to use a `std::vector` (not even a `std::valarray`) for high performance mathematical vectors. – Christian Rau Jun 12 '11 at 17:12
  • 1
    Letting *me* write my own classes is quite different from C++ supporting matrices and vectors. Talk to diehard Fortran programmers. Lack of an exponentiation operator is reason number 2 for why they look down upon C/C++. Reason #1 is lack of support for matrices and vectors. Reason #3 is the incredibly sparse math library provided by the language. – David Hammen Jun 12 '11 at 20:15
  • Diehard Fortran programmers are, for the particular case of exponentiation, wrong. "The right thing" is to call the `pow` function, *not* to generate a sequence of multiplies. The worst-case error for exponentiation by repeated multiplication is either O(n) ulps or O(log(n)) ulps, depending on the sequence used. With a good C math library (which any reputable vendor provides), the worst-case error has a constant bound less than 1 ulp. Only for the specific exponents -1, 0, 1, 2 might multiple/divide sequences be more accurate, and a good compiler will inline those in C or C++ as well. – Stephen Canon Jun 13 '11 at 22:24
  • @Stephen: That is absolutely wrong. Look at a typical scientific program. You'll see a lot of x**2, a few x**3 and x**4, some x**(-2), and maybe a few others. The predominant use of the exponentiation operator is with small integral powers. Fortran compilers are required to handle these cases optimally, where optimal means the fewest number of multiplications and at most one division. In other words, x**8 is not calculated as x*x*x*x*x*x*x*x. Only three multiplications are needed to compute x**8. Every once in a while you will see x**y, and for that the general case is needed. – David Hammen Jun 14 '11 at 10:25
  • "easiest way to implement pow is Type pow(Type x; Type y) {return exp(y*log(x));}" --> except when `x == 0` and `x < 0, y has integral value`. – chux - Reinstate Monica Nov 16 '18 at 04:16
12

In C++11 there is one case where there is an advantage to using x * x over std::pow(x,2) and that case is where you need to use it in a constexpr:

constexpr double  mySqr( double x )
{
      return x * x ;
}

As we can see std::pow is not marked constexpr and so it is unusable in a constexpr function.

Otherwise from a performance perspective putting the following code into godbolt shows these functions:

#include <cmath>

double  mySqr( double x )
{
      return x * x ;
}

double  mySqr2( double x )
{
      return std::pow( x, 2.0 );
}

generate identical assembly:

mySqr(double):
    mulsd   %xmm0, %xmm0    # x, D.4289
    ret
mySqr2(double):
    mulsd   %xmm0, %xmm0    # x, D.4292
    ret

and we should expect similar results from any modern compiler.

Worth noting that currently gcc considers pow a constexpr, also covered here but this is a non-conforming extension and should not be relied on and will probably change in later releases of gcc.

Community
  • 1
  • 1
Shafik Yaghmour
  • 154,301
  • 39
  • 440
  • 740
8

x * x will always compile to a simple multiplication. pow(x, 2) is likely to, but by no means guaranteed, to be optimised to the same. If it's not optimised, it's likely using a slow general raise-to-power math routine. So if performance is your concern, you should always favour x * x.

AshleysBrain
  • 22,335
  • 15
  • 88
  • 124
6

IMHO:

  • Code readability
  • Code robustness - will be easier to change to pow(x, 6), maybe some floating point mechanism for a specific processor is implemented, etc.
  • Performance - if there is a smarter and faster way to calculate this (using assembler or some kind of special trick), pow will do it. you won't.. :)

Cheers

Hertzel Guinness
  • 5,912
  • 3
  • 38
  • 43
  • 6
    your argument about pow(x,6) is not realistic. Also the compiler can optimise x*x too! – David Heffernan Jun 12 '11 at 09:27
  • 1
    @David I agree its weaker than the other, but the combination of those two is strong enough. The optimization argument is true for both cases, i argue that its a matter of taste (actually the whole question is...) which is better. I favor the general principle of "don't do what some else has done". – Hertzel Guinness Jun 12 '11 at 10:20
  • 1
    discussion would be moot if the language had a built in exponentiation operator like fortran, say – David Heffernan Jun 12 '11 at 11:12
  • @David Heffernan: You hit on a key sore spot amongst scientific programmers regarding the use of C/C++ for scientific programming. See my answer for more. – David Hammen Jun 12 '11 at 12:12
  • 2
    I also dispute readability. pow is horrible in any large equation. If operator overloading is considered beneficial for readability then pow should be considered harmful. – David Heffernan Jun 12 '11 at 12:18
  • 2
    @David Heffernan: IMHO, `pow(x,2)` is not horribler than `x*x`. At least it gives some grouping; if you have readability problems, you should split up the computation anyways, const comes to the win: `const float x = pow(x,2) / sqrt(y)` -> `const float num = pow(x,2), denom = sqrt(y), x = num/denom;`. Best of course would be `x²`, but this is C++. – Sebastian Mach Jun 14 '11 at 12:08
1

I would probably choose std::pow(x, 2) because it could make my code refactoring easier. And it would make no difference whatsoever once the code is optimized.

Now, the two approaches are not identical. This is my test code:

#include<cmath>

double square_explicit(double x) {
  asm("### Square Explicit");
  return x * x;
}

double square_library(double x) {
  asm("### Square Library");  
  return std::pow(x, 2);
}

The asm("text"); call simply writes comments to the assembly output, which I produce using (GCC 4.8.1 on OS X 10.7.4):

g++ example.cpp -c -S -std=c++11 -O[0, 1, 2, or 3]

You don't need -std=c++11, I just always use it.

First: when debugging (with zero optimization), the assembly produced is different; this is the relevant portion:

# 4 "square.cpp" 1
    ### Square Explicit
# 0 "" 2
    movq    -8(%rbp), %rax
    movd    %rax, %xmm1
    mulsd   -8(%rbp), %xmm1
    movd    %xmm1, %rax
    movd    %rax, %xmm0
    popq    %rbp
LCFI2:
    ret
LFE236:
    .section __TEXT,__textcoal_nt,coalesced,pure_instructions
    .globl __ZSt3powIdiEN9__gnu_cxx11__promote_2IT_T0_NS0_9__promoteIS2_XsrSt12__is_integerIS2_E7__valueEE6__typeENS4_IS3_XsrS5_IS3_E7__valueEE6__typeEE6__typeES2_S3_
    .weak_definition __ZSt3powIdiEN9__gnu_cxx11__promote_2IT_T0_NS0_9__promoteIS2_XsrSt12__is_integerIS2_E7__valueEE6__typeENS4_IS3_XsrS5_IS3_E7__valueEE6__typeEE6__typeES2_S3_
__ZSt3powIdiEN9__gnu_cxx11__promote_2IT_T0_NS0_9__promoteIS2_XsrSt12__is_integerIS2_E7__valueEE6__typeENS4_IS3_XsrS5_IS3_E7__valueEE6__typeEE6__typeES2_S3_:
LFB238:
    pushq   %rbp
LCFI3:
    movq    %rsp, %rbp
LCFI4:
    subq    $16, %rsp
    movsd   %xmm0, -8(%rbp)
    movl    %edi, -12(%rbp)
    cvtsi2sd    -12(%rbp), %xmm2
    movd    %xmm2, %rax
    movq    -8(%rbp), %rdx
    movd    %rax, %xmm1
    movd    %rdx, %xmm0
    call    _pow
    movd    %xmm0, %rax
    movd    %rax, %xmm0
    leave
LCFI5:
    ret
LFE238:
    .text
    .globl __Z14square_libraryd
__Z14square_libraryd:
LFB237:
    pushq   %rbp
LCFI6:
    movq    %rsp, %rbp
LCFI7:
    subq    $16, %rsp
    movsd   %xmm0, -8(%rbp)
# 9 "square.cpp" 1
    ### Square Library
# 0 "" 2
    movq    -8(%rbp), %rax
    movl    $2, %edi
    movd    %rax, %xmm0
    call    __ZSt3powIdiEN9__gnu_cxx11__promote_2IT_T0_NS0_9__promoteIS2_XsrSt12__is_integerIS2_E7__valueEE6__typeENS4_IS3_XsrS5_IS3_E7__valueEE6__typeEE6__typeES2_S3_
    movd    %xmm0, %rax
    movd    %rax, %xmm0
    leave
LCFI8:
    ret

But when you produce the optimized code (even at the lowest optimization level for GCC, meaning -O1) the code is just identical:

# 4 "square.cpp" 1
    ### Square Explicit
# 0 "" 2
    mulsd   %xmm0, %xmm0
    ret
LFE236:
    .globl __Z14square_libraryd
__Z14square_libraryd:
LFB237:
# 9 "square.cpp" 1
    ### Square Library
# 0 "" 2
    mulsd   %xmm0, %xmm0
    ret

So, it really makes no difference unless you care about the speed of unoptimized code.

Like I said: it seems to me that std::pow(x, 2) more clearly conveys your intentions, but that is a matter of preference, not performance.

And the optimization seems to hold even for more complex expressions. Take, for instance:

double explicit_harder(double x) {
  asm("### Explicit, harder");
  return x * x - std::sin(x) * std::sin(x) / (1 - std::tan(x) * std::tan(x));
}

double implicit_harder(double x) {
  asm("### Library, harder");
  return std::pow(x, 2) - std::pow(std::sin(x), 2) / (1 - std::pow(std::tan(x), 2));
}

Again, with -O1 (the lowest optimization), the assembly is identical yet again:

# 14 "square.cpp" 1
    ### Explicit, harder
# 0 "" 2
    call    _sin
    movd    %xmm0, %rbp
    movd    %rbx, %xmm0
    call    _tan
    movd    %rbx, %xmm3
    mulsd   %xmm3, %xmm3
    movd    %rbp, %xmm1
    mulsd   %xmm1, %xmm1
    mulsd   %xmm0, %xmm0
    movsd   LC0(%rip), %xmm2
    subsd   %xmm0, %xmm2
    divsd   %xmm2, %xmm1
    subsd   %xmm1, %xmm3
    movapd  %xmm3, %xmm0
    addq    $8, %rsp
LCFI3:
    popq    %rbx
LCFI4:
    popq    %rbp
LCFI5:
    ret
LFE239:
    .globl __Z15implicit_harderd
__Z15implicit_harderd:
LFB240:
    pushq   %rbp
LCFI6:
    pushq   %rbx
LCFI7:
    subq    $8, %rsp
LCFI8:
    movd    %xmm0, %rbx
# 19 "square.cpp" 1
    ### Library, harder
# 0 "" 2
    call    _sin
    movd    %xmm0, %rbp
    movd    %rbx, %xmm0
    call    _tan
    movd    %rbx, %xmm3
    mulsd   %xmm3, %xmm3
    movd    %rbp, %xmm1
    mulsd   %xmm1, %xmm1
    mulsd   %xmm0, %xmm0
    movsd   LC0(%rip), %xmm2
    subsd   %xmm0, %xmm2
    divsd   %xmm2, %xmm1
    subsd   %xmm1, %xmm3
    movapd  %xmm3, %xmm0
    addq    $8, %rsp
LCFI9:
    popq    %rbx
LCFI10:
    popq    %rbp
LCFI11:
    ret

Finally: the x * x approach does not require includeing cmath which would make your compilation ever so slightly faster all else being equal.

Escualo
  • 40,844
  • 23
  • 87
  • 135