54

Many algorithms require to compute (-1)^n (both integer), usually as a factor in a series. That is, a factor that is -1 for odd n and 1 for even n. In a C++ environment, one often sees:

#include<iostream>
#include<cmath>
int main(){
   int n = 13;
   std::cout << std::pow(-1, n) << std::endl;
}

What is better or the usual convention? (or something else),

std::pow(-1, n)
std::pow(-1, n%2)
(n%2?-1:1)
(1-2*(n%2))  // (gives incorrect value for negative n)

EDIT:

In addition, user @SeverinPappadeux proposed another alternative based on (a global?) array lookups. My version of it is:

const int res[] {-1, 1, -1}; // three elements are needed for negative modulo results
const int* const m1pow = res + 1; 
...
m1pow[n%2]

This is not probably not going to settle the question but, by using the emitted code we can discard some options.

First without optimization, the final contenders are:

   1 - ((n & 1) << 1);

(7 operation, no memory access)

  mov eax, DWORD PTR [rbp-20]
  add eax, eax
  and eax, 2
  mov edx, 1
  sub edx, eax
  mov eax, edx
  mov DWORD PTR [rbp-16], eax

and

   retvals[n&1];

(5 operations, memory --registers?-- access)

  mov eax, DWORD PTR [rbp-20]
  and eax, 1
  cdqe
  mov eax, DWORD PTR main::retvals[0+rax*4]
  mov DWORD PTR [rbp-8], eax

Now with optimization (-O3)

   1 - ((n & 1) << 1);

(4 operation, no memory access)

  add edx, edx
  mov ebp, 1
  and edx, 2
  sub ebp, edx

.

  retvals[n&1];

(4 operations, memory --registers?-- access)

  mov eax, edx
  and eax, 1
  movsx rcx, eax
  mov r12d, DWORD PTR main::retvals[0+rcx*4]

.

   n%2?-1:1;

(4 operations, no memory access)

  cmp eax, 1
  sbb ebx, ebx
  and ebx, 2
  sub ebx, 1

The test are here. I had to some some acrobatics to have meaningful code that doesn't elide operations all together.

Conclusion (for now)

So at the end it depends on the level optimization and expressiveness:

  • 1 - ((n & 1) << 1); is always good but not very expressive.
  • retvals[n&1]; pays a price for memory access.
  • n%2?-1:1; is expressive and good but only with optimization.
alfC
  • 14,261
  • 4
  • 67
  • 118
  • Using `pow` for this is certainly overkill, but at least it's readable and easy to understand. – Paul R Mar 17 '15 at 22:16
  • 58
    `pow` is a floating-point function so I'd avoid that. `n%2 ? -1 : 1` seems the most straightforward. – M.M Mar 17 '15 at 22:17
  • 1
    and `n%2 ? -1 : 1` is also likely to be the fastest – user3528438 Mar 17 '15 at 22:20
  • 9
    Another variant would be `(n & 1) ? -1 : 1`. Some people find it less readable, but a comment next to it clears that up and bit-masking is fast. – pjs Mar 17 '15 at 22:22
  • 5
    `(n&1)? -1: 1` should be faster -- it's equivalent in this case but not in general, so it may not be automatically optimized. – Ben Voigt Mar 17 '15 at 22:23
  • 3
    @BenVoigt Since `n%2` is immediately converted to `bool` here, I would expect identical codegen. – T.C. Mar 17 '15 at 22:23
  • 5
    @T.C.: Probably. But note that `1 - 2*(n&1)` is correct, while `(1-2*(n%2))` from the question can give `-1`, `1`, or `3` – Ben Voigt Mar 17 '15 at 22:24
  • 1
    @MattMcNabb, yes, the floating point conversion in `pow` is reason enough to stay away for it. For some reason I though that there was a integer overload of `pow` but there is none. (I fell to this before) http://en.cppreference.com/w/cpp/numeric/math/pow – alfC Mar 17 '15 at 22:31
  • @BenVoigt, can give `3`? how? Am I missing something? – alfC Mar 18 '15 at 02:11
  • @alfC: Because -1%2 is -1, not 1. But both -1 and 1 are true after bool conversion, so it doesn't matter in the case of `i%2?-1:1`. And I think most compilers understand this. – rici Mar 18 '15 at 02:34
  • @rici, ah, for negative `n`, yes. – alfC Mar 18 '15 at 04:40
  • 1
    Never ever use `pow` unless you really have a noninteger power (which is quite rare). It's by far the slowest function you can imagine (for a general case it's orders of magnitude worse than sqrt, exp or log), even for a^n where n is an integer, you should implement it with repeated squaring (much faster). I'd use the third option, it's the most clear about your intentions (hacks like `n&1` are awesome but may be confusing for some coders and the compiler will likely optimize both anyway). – orion Mar 18 '15 at 07:46
  • 3
    @orion: the hyperbole is unnecessary. `pow` is typically only slightly more expensive than an `exp` and a `log` (because it's often implemented that way, albeit with greater internal precision). Certainly not "orders of magnitude worse". – Stephen Canon Mar 18 '15 at 21:12
  • The best approach is to design your algorithm to avoid the calculation at all. It is trivial to do so. See the answer below. Basically you slice the algorithm into two sets of terms, one which adds terms and one which subtracts them. At the end, add the results of each slice. – Fred Mitchell Mar 26 '15 at 01:10
  • @rici ( -1 mod M ) is 1 for positive M. For decades, C and C++ compilers were allowed to define -1%M as either 1 or -1. The historical reason is that the "non-mod" version was fast to calculate on some processors. I think I remember reading that C++ has now standardized on the un-mathematical version, thus assuring that programs will contain tests for negativeness for all time. I simply isolate the unpleasantness with a template T mod(T n, T M) { T r= n%M; return r < 0? r+M r; } – Jive Dadson Jan 04 '18 at 07:23
  • Any solution that contains an "if" or a "?" may call into question how the processor handles branch-prediction, or whether the compiler can optimize-out the branch. – Jive Dadson Jan 04 '18 at 07:37
  • Why do you show the without-optimization code-gen first? That's totally irrelevant. You should be able to get rid of the `movsx` with `retvals[(unsigned)n & 1U]`. The first `mov` to clear the upper 32 bits is also a missed optimization because `and` already clears them. But I'd probably still recommend an ALU approach. It's likely to be able to optimize into whatever expression it's part of. – Peter Cordes Jul 30 '18 at 15:30
  • @PeterCordes, what is ALU? So you think that `retval[(unsigned)n&1u]` is the best solution, is it better than `n%2?-1:1` (with optimization)? – alfC Jul 30 '18 at 19:16
  • 1
    ALU = arithmetic / logic unit, as opposed to memory. The best choice probably depends on how you're using it. I'd guess that `n%2 ? -1 : 1` would do well in cases where you're multiplying something else by that. Instead of an actual multiply, you'll probably get a CMOV implementing the selection between `x` and `-x`, with no actual `-1` constant appearing anywhere. – Peter Cordes Jul 30 '18 at 19:29

7 Answers7

50

You can use (n & 1) instead of n % 2 and << 1 instead of * 2 if you want to be super-pedantic, er I mean optimized.
So the fastest way to compute in an 8086 processor is:

1 - ((n & 1) << 1)

I just want to clarify where this answer is coming from. The original poster alfC did an excellent job of posting a lot of different ways to compute (-1)^n some being faster than others.
Nowadays with processors being as fast as they are and optimizing compilers being as good as they are we usually value readability over the slight (even negligible) improvements from shaving a few CPU cycles from an operation.
There was a time when one pass compilers ruled the earth and MUL operations were new and decadent; in those days a power of 2 operation was an invitation for gratuitous optimization.

Eli Algranti
  • 8,707
  • 2
  • 42
  • 50
  • 12
    `n<<1` is defined as `n * 2` by the C standard (for non-negative `n`). , so this is about readability rather than optimization. – M.M Mar 17 '15 at 22:27
  • 7
    Are you going to show the benchmarks that show that this the fastest way? – juanchopanza Mar 17 '15 at 22:28
  • 3
    A good optimizing compiler *might* not compile the branch in `((n&1)?-1:1)`, but if you can do it yourself, why not? – Lee Daniel Crocker Mar 17 '15 at 22:37
  • 2
    @juanchopanza if I had an 8086 processor and an 8086 era C compiler I would but there is no need just look at the number of cycles it takes to do a multiplication vs shift and integer remainder vs logical and. It is much trickier with current compilers and modern 1 cycle multiplication processors. – Eli Algranti Mar 17 '15 at 23:07
  • 9
    I wonder how many people have an 8086 processor and an 8086 era C compiler. It seems strange to advise using a bit shift in lieu of an integer multiplication by 2 without a strong caveat. – juanchopanza Mar 18 '15 at 06:33
  • These days, `n<<1` is usually an anti-optimization: it doesn't change what the compiler outputs, but it's harder for a human to read and has surprising precedence. (IMO, it's a tossup whether `n&1` is an anti-optimization or a non-optimization) –  Mar 18 '15 at 21:02
  • 1
    @Eli: Just because you wrote a multiplication operation in `n * 2` doesn't mean the compiler will output a `mul` instruction. These days, it's considered a *bug in the compiler* if it outputs `mul` when `shl` is a better choice. –  Mar 18 '15 at 21:03
  • As in another comment by @David Schwartz, using `n % 2u` is the correct equivalent to `n & 1` (only 2 different answers) where as `n % 2` is different (3 possible answers) - Of course subsequent code and optimization may change things. – chux - Reinstate Monica Mar 18 '15 at 21:11
  • compilers nowadays emit `add` or `lea` for multiplication by 2 – phuclv Mar 19 '15 at 14:43
  • To keep readability you can wrap it to `inline` function `optimized_pow` or something like that. – Smileek Mar 25 '15 at 08:06
37

Usually you don't actually calculate (-1)^n, instead you track the current sign (as a number being either -1 or 1) and flip it every operation (sign = -sign), do this as you handle your n in order and you will get the same result.

EDIT: Note that part of the reason I recommend this is because there is rarely actually semantic value is the representation (-1)^n it is merely a convenient method of flipping the sign between iterations.

Guvante
  • 18,775
  • 1
  • 33
  • 64
  • 8
    There's no need for tracking anything. The result is a very simple function of `n`. – juanchopanza Mar 17 '15 at 22:27
  • Yes, you see all sort of things, including this. That is in par the origin of the question. – alfC Mar 17 '15 at 22:33
  • @alfC: I will try and revise, part of my point is that the actual definition is more akin to "flip the sign every other member" but it is convenient to represent that as `(-1)^n`, as that is mathematical notation and correct. Thus this methodology matches the original intention, rather than being a workaround like such tricks usually are. – Guvante Mar 17 '15 at 23:23
  • 9
    Why multiply by `-1` if you can just _flip the sign_: `sign=-sign;`? – Ruslan Mar 18 '15 at 04:49
  • 8
    +1 This answer raise an excellent point! There is "rarely semantic value" in (-1)^n, that is perfectly true and the most important observation to make :) – Ant Mar 18 '15 at 11:56
7

First of all, the fastest isOdd test I do know (in an inline method)

/**
* Return true if the value is odd
* @value the value to check
*/
inline bool isOdd(int value)
{
return (value & 1);
}

Then make use of this test to return -1 if odd, 1 otherwise (which is the actual output of (-1)^N )

/**
* Return the computation of (-1)^N
* @n the N factor
*/
inline int minusOneToN(int n)
{
return isOdd(n)?-1:1;
}

Last as suggested @Guvante, you can spare a multiplication just flipping the sign of a value (avoiding using the minusOneToN function)

/**
* Example of the general usage. Avoids a useless multiplication
* @value The value to flip if it is odd
*/
inline int flipSignIfOdd(int value)
{
return isOdd(value)?-value:value;
}
Flavien Volken
  • 19,196
  • 12
  • 100
  • 133
  • Note that `value&1` breaks on one's complement for negative signed integers. Also that any decent compiler will evaluate `(bool)(value % 2)` just as efficiently. (Not `value % 2` as an integer though, because that can be -1, 0 or +1 so it requires checking the sign bit.) – Peter Cordes Jul 30 '18 at 15:33
3

Many algorithms require to compute (-1)^n (both integer), usually as a factor in a series. That is, a factor that is -1 for odd n and 1 for even n.

Consider evaluating the series as a function of -x instead.

Ian Ollmann
  • 1,592
  • 9
  • 16
2

If it's speed you need, here goes ...

int inline minus_1_pow(int n) {
    static const int retvals[] {1, -1}; 
    return retvals[n&1];
}

The Visual C++ compiler with optimization turned to 11 compiles this down to two machine instructions, neither of which is a branch. It optimizes-away the retvals array also, so no cache misses.

Jive Dadson
  • 16,680
  • 9
  • 52
  • 65
  • That seems to be simpler that the one at the end of my question. I am curious if in Visual C++, they are the same instructions that `static const int res[] {-1, 1, -1}; const int* const m1pow = res + 1; return m1pow[n%2]` would produce. – alfC Jan 04 '18 at 09:40
  • This is was a final contender (see my edit to the question). – alfC Jan 16 '18 at 23:40
  • How does it optimize away the `retvals` array? https://godbolt.org/g/cd1JkX shows that MSVC (like other compilers) is actually loading from memory, from a static array. If used often you won't have any cache misses, but only because it will stay hot in L1d. – Peter Cordes Jul 30 '18 at 15:39
1

What about

(1 - (n%2)) - (n%2)

n%2 most likely will be computed only once

UPDATE

Actually, simplest and most correct way would be using table

const int res[] {-1, 1, -1};

return res[n%2 + 1];
Severin Pappadeux
  • 18,636
  • 3
  • 38
  • 64
  • 1
    For `n = -1` it gives `3` (just like `1-(1-2*(n%2))`) – alfC Mar 18 '15 at 04:42
  • A modulo is still (much) slower than a mask… – Flavien Volken Mar 18 '15 at 14:01
  • @FlavienVolken but a decent compiler will convert a modulo of a power of two into a mask – Alnitak Mar 18 '15 at 16:39
  • 1
    @Alnitak: The conversion is only valid for non-negative values of `n` (e.g. unsigned type). – R.. GitHub STOP HELPING ICE Mar 18 '15 at 18:05
  • 4
    Neither the question nor OP's code `int n` preclude `n` from having a negative value - thus making this solution unusable. Even if not practicable that `n` should even have a negative value, a compiler will likely not conclude that and thus limit optimizations. Suggest simply preface with `unsigned n;` – chux - Reinstate Monica Mar 18 '15 at 20:36
  • @FlavienVolken True, bit irrelevant. The relevant question is whether a modulo of a power of 2 constant is slower than a mask. And the answer is that it isn't, because they're obviously equivalent. – David Schwartz Mar 18 '15 at 20:36
  • @David Schwartz Although true when `n` is `unsigned`, when `n` is `int`, `n%2` is not the same as `n&1` as in OP's code. – chux - Reinstate Monica Mar 18 '15 at 20:41
  • 1
    @chux Right, it should be `n%2u`, which should produce exactly the same code as `n&1`. – David Schwartz Mar 18 '15 at 20:45
  • You are the only one proposing lookups in an array (and also say is the most correct one). I guess the constant array has to be local (to the block) to avoid look ups far away in memory (?). In any case, an improvement could be to use this simplified notation: `const int res[] {-1, 1, -1}; const int* const m1pow = res + 1; ` and then use `m1pow[n%2]` which is not so bad notationally. I think there is no problem accessing, for `n%2 == -1` the negative index `m1pow[-1]`. In this way, the addition `+1` is avoided (at least in the code). – alfC Mar 20 '15 at 07:14
  • @alfC Yes, there is no problem accessing array with negative index. And yours code is the fastest one, you're doing addition only once, during setup. Local or non-local now is really blurred now with three levels of cache and n-way associativity – Severin Pappadeux Mar 20 '15 at 15:58
  • @alfC: Calculating signed `n%2` is needlessly more expensive than `res[ (n%2)!=0 ]`, which on a two's complement machine is equivalent to `res [ n&1 ]`. gcc/clang/MSVC optimize it correctly (to just an `and` + load), but unfortunately ICC compiles it naively to a -1 / 0 / 1 and then compares. https://godbolt.org/g/5M7ab1 – Peter Cordes Jul 30 '18 at 15:43
1

Well if we are performing the calculation in a series, why not handle the calculation in a positive loop and a negative loop, skipping the evaluation completely?

The Taylor series expansion to approximate the natural log of (1+x) is a perfect example of this type of problem. Each term has (-1)^(n+1), or (1)^(n-1). There is no need to calculate this factor. You can "slice" the problem by either executing 1 loop for every two terms, or two loops, one for the odd terms and one for the even terms.

Of course, since the calculation, by its nature, is one over the domain of real numbers, you will be using a floating point processor to evaluate the individual terms anyway. Once you have decided to do that, you should just use the library implementation for the natural logarithm. But if for some reason, you decide not to, it will certainly be faster, but not by much, not to waste cycles calculating the value of -1 to the nth power.

Perhaps each can even be done in separate threads. Maybe the problem can be vectorized, even.

Fred Mitchell
  • 2,145
  • 2
  • 21
  • 29