Java - Why is the tail call recursion pattern faster than the normal recursion pattern?

Question

i wonder why methods in the "tail call" pattern perform measurably better in Java then such which use a classic recursion pattern.

Given the following example:

public static long facultyApproachPureRec(long n)
{
    if(n == 0 || n == 1)
    {
        return 1;
    }

    long temp = facultyApproachPureRec(n-1);
    return (n*temp);
}

public static long facultyApproachTailEnd(long n, long m)
{
    if(n == 0 || n == 1)
    {
        return m;
    }

    long temp = n*m;
    return facultyApproachTailEnd(n-1, temp);
}

the facultyApproachTailEnd is always faster then the pure recursion solution. This makes me wonder why because Java doesnt support tail call optimization (which i proved: i can get a StachOverflowError easily). Because of this fact I would assume the tail call solution to perform worse then the pure recursive soltion because we always need to pass the double amount of variables.

Can somebody explain me why the Benchmarks indicate that the tail call soltion is always faster than the oter one?

Note: In c# it behaves like i would expect it to behave: the tail call solution is always slower than the pure recursive solution.

Benchmark Code:

    public static void main(String[] args)
{
    for(int k = 0; k < 10; k++) {
        long begin = System.nanoTime();
        for (int i = 0; i < 20000; i++) {
            if (facultyApproachPureRec(i) == 0)
            {
                i++;
            }
        }

        long end = System.nanoTime();
        long elapsed = TimeUnit.MILLISECONDS.convert((end - begin), TimeUnit.NANOSECONDS);
        System.out.println("Pure recursion took " + elapsed);

        begin = System.nanoTime();
        for (int i = 0; i < 20000; i++) {
            if(facultyApproachTailEnd(i, 1) == 0)
            {
                i++;
            }
        }

        end = System.nanoTime();
        elapsed = TimeUnit.MILLISECONDS.convert((end - begin), TimeUnit.NANOSECONDS);
        System.out.println("Tail end took " + elapsed);
    }
}

Some results (validated on 3 machines, all are more or less the same):

Pure recursion took 623
Tail end took 461
Pure recursion took 570
Tail end took 418
Pure recursion took 913
Tail end took 581
Pure recursion took 580
Tail end took 417
Pure recursion took 607
Tail end took 411
Pure recursion took 601
Tail end took 417
Pure recursion took 769
Tail end took 372
Pure recursion took 527
Tail end took 568
Pure recursion took 542
Tail end took 419
Pure recursion took 632
Tail end took 333

Thanks for all answers.

Edit: I tested this a lot more often (1000 iterations, directly via cmd (so no IDE)) before writing this question. If my test is wrong im sorry. I didnt use Java very much outside the university.

Edit2: I appended the generated ByteCode below. I am not familiar with Java Byte Code (i am more used to clr), but from a short view they look pretty equal except that the tail call solution has some more code inside and uses/maintains more variables.

    public static facultyApproachPureRec(J)J
   L0
    LINENUMBER 39 L0
    LLOAD 0
    LCONST_0
    LCMP
    IFEQ L1
    LLOAD 0
    LCONST_1
    LCMP
    IFNE L2
   L1
    LINENUMBER 41 L1
   FRAME SAME
    LCONST_1
    LRETURN
   L2
    LINENUMBER 44 L2
   FRAME SAME
    LLOAD 0
    LCONST_1
    LSUB
    INVOKESTATIC Logik/Rekursion/MainLogik.facultyApproachPureRec (J)J
    LSTORE 2
   L3
    LINENUMBER 45 L3
    LLOAD 0
    LLOAD 2
    LMUL
    LRETURN
   L4
    LOCALVARIABLE n J L0 L4 0
    LOCALVARIABLE temp J L3 L4 2
    MAXSTACK = 4
    MAXLOCALS = 4

  // access flags 0x9
  public static facultyApproachTailEnd(JJ)J
   L0
    LINENUMBER 50 L0
    LLOAD 0
    LCONST_0
    LCMP
    IFEQ L1
    LLOAD 0
    LCONST_1
    LCMP
    IFNE L2
   L1
    LINENUMBER 52 L1
   FRAME SAME
    LLOAD 2
    LRETURN
   L2
    LINENUMBER 55 L2
   FRAME SAME
    LLOAD 0
    LLOAD 2
    LMUL
    LSTORE 4
   L3
    LINENUMBER 56 L3
    LLOAD 0
    LCONST_1
    LSUB
    LLOAD 4
    INVOKESTATIC Logik/Rekursion/MainLogik.facultyApproachTailEnd (JJ)J
    LRETURN
   L4
    LOCALVARIABLE n J L0 L4 0
    LOCALVARIABLE m J L0 L4 2
    LOCALVARIABLE temp J L3 L4 4
    MAXSTACK = 4
    MAXLOCALS = 6
}

Your test is flawed. See [How do I write a correct micro-benchmark in Java](https://stackoverflow.com/q/504103/1876620). — fps, Dec 21 '17 at 21:19
This may be as simple as the fact that the CPU can schedule the multiply and the subtract in parallel in the second version. — Oliver Charlesworth, Dec 21 '17 at 21:19
This is not about bytecode, you need the generated machine code. — Jorn Vernee, Dec 21 '17 at 21:41

Java - Why is the tail call recursion pattern faster than the normal recursion pattern?

0 Answers0