16

When I compile the following simple recursion code with g++, the assembly code simply returns i, as if g++ can do some algebra tricks as humans can.

int Identity(int i) {
  if (i == 1)
    return 1;
  else
    return Identity(i-1)+1;
}

I don't think this optimization is about tail recursion, and apparently, g++ must at least do these two things:

  1. If we pass a negative value, this code will fall into a infinite loop, so is it valid for g++ to eliminate this bug?
  2. While it is possible to enumerate all values from 1 to INT_MAX, and then g++ can tell that this function shall return i, apparently g++ uses a smarter detection method since the compilation process is pretty fast. Therefore my problem is, how does the compiler optimization do that?

How to reproduce

% g++ -v
gcc version 8.2.1 20181127 (GCC)

% g++ a.cpp -c -O2 && objdump -d a.o
Disassembly of section .text:
0000000000000000 <_Z8Identityi>:
   0:   89 f8                   mov    %edi,%eax
   2:   c3

Updated: Thanks to many people for answering the problem. I collected some discussions and updates here.

  1. The compiler uses some method to know that passing negative values leads to UB. Maybe the compiler uses the same method to do the algebra tricks.
  2. About the tail recursion: According to Wikipedia, my former code is NOT the tail recursion form. I have tried the tail recursion version and gcc generates the correct while loop. However, it cannot just return i as my former code does.
  3. Someone points out that the compiler might try to "prove" f(x) = x but I still do not know the name of the actual optimization technique used. I am interested in the exact name of such an optimization, such as common subexpression elimination (CSE) or some combination of them or whatever.

Updated + answered: Thanks to the answer below (I have marked it helpful, and also check the answer from manlio), I guess I understand how a compiler can do this in an easy way. Please see the example below. First, modern gcc can do something more powerful than tail recursion, so the code is converted into something like this:

// Equivalent to return i
int Identity_v2(int i) {
  int ans = 0;
  for (int i = x; i != 0; i--, ans++) {}
  return ans;
}
// Equivalent to return i >= 0 ? i : 0
int Identity_v3(int x) {
  int ans = 0;
  for (int i = x; i >= 0; i--, ans++) {}
  return ans;
}

(I guess that) the compiler can know that ans and i share the same delta and it also knows i = 0 when leaving the loop. Therefore, the compiler knows it should return i. In v3, I use the >= operator so the compiler also checks the sign of the input for me. This could be much simpler than I've guessed.

Boann
  • 48,794
  • 16
  • 117
  • 146
johnjohnlys
  • 394
  • 1
  • 9
  • 5
    *"If we pass a negative value, this the original code will fall into a infinite loop, so is it valid for g++ to eliminating this bug?"* If we enter a negative number it will run until there's integer underflow after it hits `INT_MIN`, which is undefined behavior. So the compiler can do whatever it wants with that. – Blaze Feb 14 '19 at 08:52
  • So I think problem 1 and 2 can be merged into one, that is, how can g++ smartly calculate the recursion formula such that it knows whether this function hits a undefined behavior or shall return i? – johnjohnlys Feb 14 '19 at 09:07
  • @Daniel right, I forgot about that. Sorry. – Blaze Feb 14 '19 at 09:09
  • @Daniel if you modify the function this way, then 0 is the *correct* answer, exactly because overflow (note, that's *not* underflow, underflow is about floating point) for unsigned numbers is well defined. – n. m. could be an AI Feb 14 '19 at 09:12
  • @Daniel it was me who mixed up the terms first :) And yes, you're right, subtracting from unsigned numbers in a way that would make them negative does indeed roll them over like you described. https://stackoverflow.com/a/7221449/10411602 – Blaze Feb 14 '19 at 09:16
  • The second question is unclear. Are you asking what logic the compiler uses to implement this optimisation pass? Most likely the same logic humans do, namely, reasoning by induction. Are you asking about the specific algorithm the compiler runs to implement this logic? You will have to read compiler sources for that. – n. m. could be an AI Feb 14 '19 at 09:18
  • Also I was thinking, if we have it roll around to `INT_MIN` and then all the way back once until it hits `1` again, that means the result must roll all the way up to `INT_MAX` and will also roll around. So wouldn't the final result be `i` either way? (or, since it's negative, `INT_MAX+1-i`) – Blaze Feb 14 '19 at 09:19
  • @Daniel I don't see where a sum of all values is computed. The program only ever adds 1. – n. m. could be an AI Feb 14 '19 at 09:21
  • 1
    @BlazeThe compiler is **right** to optimize the unsigned version of `Identity` to returning its argument. If `Identity` is signed, passing a value <=0 leads to UB so the optimization is right. So regardless of the signness of the function argument, all is fine. – YSC Feb 14 '19 at 09:22
  • First the compiler turns recursion into a loop. Then it sees a loop where each variable does just +1 or -1 at each iteration, so it can easily deduce the number of iterations and the final result. – Marc Glisse Feb 14 '19 at 09:25
  • @n.m. The human induction process is to first prove f(x) = x for x = 0, then prove if x stands then x+1 stands then all positive integers stand. But for compiler it cannot know we want to prove "f(x) = x for all integers" in advance. Also, the human induction process cannot eliminate the negative value problem mentioned above (I guess). – johnjohnlys Feb 14 '19 at 09:28
  • 1
    @user463035818 why are infinite loops undefined behaviour? – n. m. could be an AI Feb 14 '19 at 09:28
  • "it cannot know". You have to *prove* that. It doesn't mean anything anyway. The compiler just tries some canned strategies and sees if any one succeeds. *the human induction process cannot eliminate the negative value problem* I have eliminated the negative value problem, therefore it's possible. – n. m. could be an AI Feb 14 '19 at 09:34
  • 3
    @n.m.: [is-infinite-recursion-ub](https://stackoverflow.com/questions/5905155/is-this-infinite-recursion-ub). But we don't have infinite loop here Btw. – Jarod42 Feb 14 '19 at 09:39
  • @Jarod42 thanks, good to know! – n. m. could be an AI Feb 14 '19 at 10:30

3 Answers3

7

GCC's optimization passes work on an intermediary representation of your code in a format called GIMPLE.

Using the -fdump-* options, you can ask GCC to output intermediary states of the tree and discover many details about the performed optimizations.

In this case the interesting files are (numbers may vary depending on the GCC version):

.004t.gimple

This is the starting point:

int Identity(int) (int i)
{
  int D.2330;
  int D.2331;
  int D.2332;

  if (i == 1) goto <D.2328>; else goto <D.2329>;
  <D.2328>:
  D.2330 = 1;
  return D.2330;
  <D.2329>:
  D.2331 = i + -1;
  D.2332 = Identity (D.2331);
  D.2330 = D.2332 + 1;
  return D.2330;
}

.038t.eipa_sra

The last optimized source which presents recursion:

int Identity(int) (int i)
{
  int _1;
  int _6;
  int _8;
  int _10;

  <bb 2>:
  if (i_3(D) == 1)
    goto <bb 4>;
  else
    goto <bb 3>;

  <bb 3>:
  _6 = i_3(D) + -1;
  _8 = Identity (_6);
  _10 = _8 + 1;

  <bb 4>:
  # _1 = PHI <1(2), _10(3)>
  return _1;
}

As is normal with SSA, GCC inserts fake functions known as PHI at the start of basic blocks where needed in order to merge the multiple possible values of a variable.

Here:

# _1 = PHI <1(2), _10(3)>

where _1 either gets the value of 1, or of _10, depending on whether we reach here via block 2 or block 3.

.039t.tailr1

This is the first dump in which the recursion has been turned into a loop:

int Identity(int) (int i)
{
  int _1;
  int add_acc_4;
  int _6;
  int acc_tmp_8;
  int add_acc_10;

  <bb 2>:
  # i_3 = PHI <i_9(D)(0), _6(3)>
  # add_acc_4 = PHI <0(0), add_acc_10(3)>
  if (i_3 == 1)
    goto <bb 4>;
  else
    goto <bb 3>;

  <bb 3>:
  _6 = i_3 + -1;
  add_acc_10 = add_acc_4 + 1;
  goto <bb 2>;

  <bb 4>:
  # _1 = PHI <1(2)>
  acc_tmp_8 = add_acc_4 + _1;
  return acc_tmp_8;
}

The same optimisation that handles tail calls also handles trivial cases of making the call tail recursive by creating accumulators.

There is a very similar example in the starting comment of the https://github.com/gcc-mirror/gcc/blob/master/gcc/tree-tailcall.c file:


The file implements the tail recursion elimination. It is also used to analyze the tail calls in general, passing the results to the rtl level where they are used for sibcall optimization.

In addition to the standard tail recursion elimination, we handle the most trivial cases of making the call tail recursive by creating accumulators.

For example the following function

int sum (int n)
{
  if (n > 0)
    return n + sum (n - 1);
  else
    return 0;
}

is transformed into

int sum (int n)
{
  int acc = 0;
  while (n > 0)
    acc += n--;
  return acc;
}

To do this, we maintain two accumulators (a_acc and m_acc) that indicate when we reach the return x statement, we should return a_acc + x * m_acc instead. They are initially initialized to 0 and 1, respectively, so the semantics of the function is obviously preserved. If we are guaranteed that the value of the accumulator never change, we omit the accumulator.

There are three cases how the function may exit. The first one is handled in adjust_return_value, the other two in adjust_accumulator_values (the second case is actually a special case of the third one and we present it separately just for clarity):

  1. Just return x, where x is not in any of the remaining special shapes. We rewrite this to a gimple equivalent of return m_acc * x + a_acc.
  2. return f (...), where f is the current function, is rewritten in a classical tail-recursion elimination way, into assignment of arguments and jump to the start of the function. Values of the accumulators are unchanged.
  3. return a + m * f(...), where a and m do not depend on call to f. To preserve the semantics described before we want this to be rewritten in such a way that we finally return a_acc + (a + m * f(...)) * m_acc = (a_acc + a * m_acc) + (m * m_acc) * f(...). I.e. we increase a_acc by a * m_acc, multiply m_acc by m and eliminate the tail call to f. Special cases when the value is just added or just multiplied are obtained by setting a = 0 or m = 1.
manlio
  • 18,345
  • 14
  • 76
  • 126
  • This is a very detailed answer to my problem, thanks. While I cannot understand the last part pretty well, I know that modern compilers do very elegant transformations to optimize the codes. – johnjohnlys Feb 14 '19 at 16:34
0

gcc is able to make optimizations on recursion even in the case of non tail-recursive calls. I suppose that a lot of common recursive patterns are searched for and then translated into their iterative or closed form counterpart.

You may read this good old short page about (not)-well known optimization facts about gcc.

Jean-Baptiste Yunès
  • 34,548
  • 4
  • 48
  • 69
  • 3
    It is not tail-recursive call, but can easily be transformed into it: see [what-is-tail-call-optimization](https://stackoverflow.com/a/9814654/2684539) example with factorial. – Jarod42 Feb 14 '19 at 09:45
  • @Jarod42 Becareful, that can't be a general method. Tail-recursives functions is a subset of recursives one. And, there is no general method to transform a recursive function to a tail-recursive form if such exists. This is why I said "common patterns". – Jean-Baptiste Yunès Feb 14 '19 at 09:50
  • @Jean-BaptisteYunès I don't think anyone claimed that this works *every time*. In the given cases it works because the return expression is of the form `return op(f(…), …);` and integer addition and multiplication are [associative](https://en.wikipedia.org/wiki/Associative_property) (with the admitted complication that the compiler has to prove its transformation doesn't cause harm wrt overflow of intermediate values). – Arne Vogel Feb 14 '19 at 11:18
  • @ArneVogel isn't this a "pattern"? – Jean-Baptiste Yunès Feb 14 '19 at 11:32
0

If we pass a negative value, this the original code will fall into a infinite loop, so is it valid for g++ to eliminating this bug?

Incrementing/decrementing signed integers can cause overflow/undeflow which is undefined behaviour (unlike for unsigned integers). The compiler simply assumes that UB doesn't happen here (i.e. the compiler always assumes that signed integers don't overflow/undeflow, unless you use -fwrapv). If it does then that is a programming error.

Maxim Egorushkin
  • 131,725
  • 17
  • 180
  • 271