Fastest way to flip the sign of a double / float in C

Question

What is the fastest way to flip the sign of a double (or float) in C?

I thought, that accessing the sign bit directly would be the fastest way and found the following:

double a = 5.0;
*(__int64*)&a |= 0x8000000000000000;
// a = -5.0

float b = 3.0;
*(int*)&b |= 0x80000000;
// b = -3.0

However, the above does not work for negative numbers:

double a = -5.0;
*(__int64*)&a |= 0x8000000000000000;
// a = -5.0

That code is completely non-portable. Not only do you invoke various platform-dependant implementations of float, you also make your code depentent on endianess. — Lundin, Mar 06 '11 at 21:11
Its also likely to kill performance for a register stored floating point number - it would need to be moved to an integer register, have the operation performed, and then moved back to the FP (x87/SSE) register. — Yann Ramin, Mar 06 '11 at 21:26
I'm curious to know what calculation has floating point negation as its performance bottleneck — David Heffernan, Mar 06 '11 at 22:10
@Yann Ramin: GCC optimizes '-a' into 'XORPS %XMM1, %XMM0' (GNU syntax), with XMM1 holding the negation bitmask on x86_64 and uses FCHS on x86_32. — datenwolf, Mar 06 '11 at 22:29
Your code is also a horrible violation of aliasing rules meaning it *will not do what you want* on modern compilers. — R.. GitHub STOP HELPING ICE, Mar 07 '11 at 05:48
@David You're right, this certainly isn't the bottleneck of my calculation. I'm just trying to squeeze the last bit of performance out of a 5 day Monte-Carlo integration. The 6D unbound integration domain [-∞,∞]^6 is transformed onto [0,1]^6. To calculate the integrand once then requires 63 sign flips in the 6D coordinate vector. The Monte Carlo sample number is usually 10^4-10^5 and the integration has to be carried out for 10^6 different parameter sets. So that makes at least 10^11 sign flips in the overall process. — hennes, Mar 07 '11 at 11:37
@hennes I'd be astonished if you would even notice the time taken by the sign flips/ — David Heffernan, Mar 07 '11 at 11:39
@David Yeah, you're right. It was just an idea. And I was curious why that bit twiddling I found didn't work for negative numbers. Thx everyone for your help! :) — hennes, Mar 07 '11 at 11:44

datenwolf · Accepted Answer · 2011-03-13T09:39:13.957

46

Any decent compiler will implement this bit manipulation if you just prepend a negation operator, i.e. -a. Anyway, you're OR-ing the bit. You should XOR it. This is what the compilers I tested it do anyway (GCC, MSVC, CLang). So just do yourself a favour and write -a

EDIT: Be aware that C doesn't enforce any specific floating point format, so any bit manipulations on non-integral C variables will eventually result in errornous behaviour.

EDIT 2 due to a comment: This is the negation code GCC emits for x86_64

.globl neg
    .type   neg, @function
neg:
.LFB4:
    .cfi_startproc
    pushq   %rbp
    .cfi_def_cfa_offset 16
    movq    %rsp, %rbp
    .cfi_offset 6, -16
    .cfi_def_cfa_register 6
    movss   %xmm0, -4(%rbp)
    movss   -4(%rbp), %xmm1
    movss   .LC0(%rip), %xmm0
    xorps   %xmm1, %xmm0  /* <----- Sign flip using XOR */
    leave
    .cfi_def_cfa 7, 8
    ret
    .cfi_endproc
.LFE4:
    .size   neg, .-neg

It should be noted that xorps is XOR designed for floatin points, taking care of special conditions. It's a SSE instruction.

edited Mar 13 '11 at 09:39

answered Mar 06 '11 at 20:59

datenwolf

159,371
13
185
298

`Any decent compiler will implement this bit manipulation` this is wrong. The compiler *needs* to account for `NaN` values. `-NaN` is still a NaN, but NaN xor sign bit is no longer a NaN. Compilers can't and wont make this optimization. – Inverse Mar 13 '11 at 04:58
1

Did you look at the actual binary output? I did it for the mentioned compilers and they do it. – datenwolf Mar 13 '11 at 09:30
Thanks for posting the disassembly. VC does not xor the sign bit, it uses `fchs`. I'm somewhat shocked that gcc does this... is this with -O2? NaNs not propagating can lead to serious bugs. – Inverse Mar 17 '11 at 18:39
What's the output for `-log(-1.0f)`? `nan` or `2.14748e+09`? – Inverse Mar 17 '11 at 18:51
@Inverse: XORPS is an instruction designed especially for operation on floats, provisioning for NaNs. Also GCC uses FCHS if compiling for 32 bit target. And no, I didn't use any optimizations. – datenwolf Mar 17 '11 at 20:06
9

@inverse: What are you talking about? In IEEE, NaN is encoded with the exponent portion set to all ones, and any non-zero mantissa (mantissa zero means infinity). The sign bit is irrelevant. Quiet or signalling is (in practice) the highest magnitude bit in the mantissa. Again, not the sign bit. – wnoise Oct 29 '11 at 01:54
@Inverse: GCC does not do it for any `-O` but for `-ffast-math`. See [Negative NaN is not a NaN?](http://stackoverflow.com/q/3596622/183120). – legends2k Mar 28 '14 at 15:36

score 34 · Answer 2 · answered Mar 06 '11 at 21:00

34

a=-a

answered Mar 06 '11 at 21:00

David Heffernan

601,492
42
1,072
1,490

16

This would be the most efficient way. Bit manipulations of floats are not portable, never touch floats with bitwise operators. Leave that to the compiler. – Lundin Mar 06 '11 at 21:07
This seriously borders on NAA. The comment does all the work that the answer doesn’t. – Cimbali Jun 25 '22 at 11:20

score 5 · Answer 3 · edited May 23 '17 at 10:27

5

This code is undefined since it violates the strict aliasing rule. What is the strict aliasing rule? To do this well defined you will have to rely on the compiler optimizing it for you.

edited May 23 '17 at 10:27

Community

1
1

answered Mar 06 '11 at 21:01

Maister

4,978
1
31
34

score 4 · Answer 4 · answered Mar 06 '11 at 20:58

4

If you want portable way, just multiply by -1 and let compiler optimise it.

answered Mar 06 '11 at 20:58

qrdl

34,062
14
56
86

5

Is there a difference between a*=-1 and a=-a? I would suspect the first variant to actually carry out the multiplication instead of just flipping the sign bit and hence to be slower. – hennes Mar 06 '11 at 21:57
4

@hennes a) Whether there is a difference depends upon the compiler implementation. b) Why would you suspect that the optimizer is inoperative? – Jim Balter Mar 07 '11 at 01:13
@Jim Yes, seems you're right. Sorry, I was just curious. Guess I should strengthen my trust in modern compilers. – hennes Mar 07 '11 at 11:19
1

There is a big difference. Multiplication is considered an arithmetical operation and must follow a specific set of rules, like setting the floating-point exception flags. Negation is not, and can thus be implemented by toggling the sign bit. (For integers, however, they could be considered the same.) – Lindydancer Mar 07 '11 at 23:25
2

@Lindydancer: That's what I thought, too. However, I checked it on GCC and Intel's compiler and couldn't find a difference in performance between a=-a, a*=-1 and a*=-1.0. So I guess they are all optimized the same way by the compiler. – hennes Mar 08 '11 at 07:25
@hennes: It all depends on the target architecture. If there is a FPU instruction than can do everything, the performance will be the same. However, for many microcontrollers without an FPU there will might be a difference. – Lindydancer Mar 08 '11 at 08:10
@Lindydancer: Agreed. I didn't think of such situations. – hennes Mar 08 '11 at 08:37
You could have a difference if you have an accessor to your value: `x.get() *= -1.0` or `x.get() = -x.get()` – Caduchon Mar 15 '21 at 09:27
@Caduchon `x.get()` is not an lvalue, so your example makes no sense – qrdl Mar 15 '21 at 09:33

Fastest way to flip the sign of a double / float in C

4 Answers4

Linked