Why simple factorial algorithms in JS are much faster than in Python or R?

Question

Why is JavaScript being so much more faster in this computation?

I've been making some tests with four simple factorial algorithms: recursion, tail recursion, while loop and for loop. I've made the tests in R, Python, and Javascript.

I measured the time it took for each algorithm to compute 150 factorial, 5000 times. For R I used system.time(replicate()). For Python I used time.clock(), the resource module, and the timeit module. For JavaScript I used console.time(), Date().getMilliseconds(), and Date().getTime(), running the script using node via the terminal.

This was never intended to compare running times between languages, but to see which form (recursive, tail recursive, for loop, or while loop) was faster for the languages I'm learning. The performance of the JavaScript algorithms caught my attention, though.

You can see the 4 different factorial algorithms and the measurement implementations here:

R factorial algorithms and performance.

Python factorial algorithms and performance.

JavaScript factorial algorithms and performance.

On the following examples, f stands for for loop, w stands for while loop.

The results for R are:

Running time of different factorial algorithm implementations, in seconds.
Compute 150 factorial 5000 times:

factorialRecursive()
 user  system elapsed 
0.044   0.001   0.045 

factorialTailRecursive()
 user  system elapsed 
3.409   0.012   3.429 

factorialIterW()
 user  system elapsed 
2.481   0.006   2.488 

factorialIterF()
 user  system elapsed 
0.868   0.002   0.874

The results for Python are:

Running time of different factorial algorithm implementations.
Uses timeit module, resource module, and a custom performance function.

Compute 150 factorial 5000 times:

factorial_recursive()
timeit:          0.891448974609
custom:          0.87
resource user:   0.870953
resource system: 0.001843

factorial_tail_recursive()
timeit:          1.02211785316
custom:          1.02
resource user:   1.018795
resource system: 0.00131

factorial_iter_w()
timeit:          0.686491012573
custom:          0.68
resource user:   0.687408
resource system: 0.001749

factorial_iter_f()
timeit:          0.563406944275
custom:          0.57
resource user:   0.569383
resource system: 0.001423

The results for JavaScript are:

Running time of different factorial algorithm implementations.
Uses console.time(), Date().getTime() and Date().getMilliseconds()

Compute 150 factorial 5000 times:

factorialRecursive(): 30ms
Using Date().getTime(): 19ms
Using Date().getMilliseconds(): 19ms

factorialTailRecursive(): 44ms
Using Date().getTime(): 44ms
Using Date().getMilliseconds(): 43ms

factorialIterW(): 4ms
Using Date().getTime(): 3ms
Using Date().getMilliseconds(): 3ms

factorialIterF(): 4ms
Using Date().getTime(): 4ms
Using Date().getMilliseconds(): 3ms

If I understand correctly, there is no way to measure CPU time in JavaScript using JS code, and the methods used above measure wall clock time.

The wall clock time measurements of JavaScript are much faster than the Python or R implementations.

For example, wall clock running time of factorial algorithm using for loop: R: 0.874s Python: 0.57 s JavaScript: 0.004s

Why is JavaScript being so much more faster in this computation?

None of those languages actually support tail recursion. As for the speed comparisons, V8 really is faster. V8 uses a JIT, so try comparing it to PyPy, as neither CPython or R use one. — Blender, Oct 07 '13 at 04:30
There could be several reasons, my guess is that the JS example has a hot start. The JS interpreter is loaded from the moment you open the browser while the R or Python interpreter are loaded lazy into the memory thus introducing cache faults. Furthermore measurements based on seconds are not really significant. I should repeat the experiment millions of times. — Willem Van Onsem, Oct 07 '13 at 04:32
@CommuSoft I didn't use a browser for the JS tests. I ran the script using node from the terminal: node factorial.js — NPN328, Oct 07 '13 at 04:34
@CommuSoft. I don't think a browser is involved. This is run on node.js, right? — Thilo, Oct 07 '13 at 04:35
I think you're missing something obvious: Python has unbounded integers, while JavaScript uses IEEE double-precision floating point. So while JS is working entirely with native machine types, Python is slinging integers with over 200 decimal digits (`150!` has 263 decimal digits). — Tim Peters, Oct 07 '13 at 04:36
Please do check the results of each, if the output is calculated correct. — user568109, Oct 07 '13 at 04:44
There's also the fact that JS VMs have had enormous ammounts of optimization effort put into them by top notch teams at browser vendors; the difference basically boils down to the fact because JS is the only option for the browser it simply *had* to be as fast as possible and browser vendors sunk lots of money into it. (c.f. the surprising benchmark results on the Julia homepage: http://julialang.org/ — James Porter, Oct 07 '13 at 05:03
Note that your R version of `factorialRecursive` is not recursive at all. (I'll write an answer about that) — cbeleites unhappy with SX, Oct 07 '13 at 08:50
I have no idea about what (=how much) optimization by the compiler/interpreter does with this benchmarking setup. However, I think it is easy to spot that a large number of identical calls is executed. Now, I can see easily that the functions are deterministic and depending *only* on their one argument. If the interpreter/compiler can spot this as well, it may optimize away the repeated calls. @JCPedroza: how do the timings scale with the number of repetitions? — cbeleites unhappy with SX, Oct 07 '13 at 11:10

score 14 · Accepted Answer · edited May 23 '17 at 12:13

In detail I can only speak for R, but here are my 2ct. Maybe you can analyze what happens in the other languages and then come to conclusions.

First of all, however, your R version of factorialRecursive is not recursive: you call R's factorial (n - 1) which uses the $\Gamma$ function.

Here are my benchmarking results including the factorial via the gamma function and a more Rish (vectorized) way of expressing the iterative calculation:

> factorialCumprod <- function(n) cumprod (seq_len (n))[n]

> microbenchmark(factorial(150), 
                 factorialRecursive(150), factorialTailRecursive(150), 
                 factorialIterF(150), factorialIterW(150), factorialCumprod (150),
                 times = 5000)
Unit: microseconds
                        expr     min      lq   median       uq       max neval
              factorial(150)   1.258   2.026   2.2360   2.5850    55.386  5000
     factorialRecursive(150) 273.014 281.325 285.0265 301.2310  2699.336  5000
 factorialTailRecursive(150) 291.732 301.858 306.4690 323.9295  4958.803  5000
         factorialIterF(150)  71.728  74.941  76.1290  78.7830  2894.819  5000
         factorialIterW(150) 218.118 225.102 228.0360 238.3020 78845.045  5000
       factorialCumprod(150)   3.493   4.959   5.3790   5.9375    65.444  5000

microbenchmark randomizes the order of the function calls. Sometimes that does make a difference compared to executing blocks of exactly the same function call.

I guess what you can learn here is that you need to take into account design decisions by the developers of a language/language implementation when you choose your algorithm.

R is known to be slow at recursion. I find that a simple function call without doing anything but return a constant already costs about 750 ns, so the 150 function calls in would account already for about half the time of the recursive algorithms. Directly calling gamma (150 + 1) instead of doing this indirectly by factorial (150) yields a similar difference. If you want to know more why this is so, you'll have to ask the R core team.

The loops spend an amazing amount of overhead on checking things. To give you an impression:

> for (i in 1 : 3) {
+    cat (i ," ")
+    i <- 10
+    cat (i ,"\n")
+ }
1  10 
2  10 
3  10

I think saving this work is essentially where the speedup in vectorized functions comes from.

The difference between the while and for iterative versions probably comes from the fact that the n : 1 in the for loop is vectorized. Taking this a step further, i.e. using the cumprod function R provides for cumulative products considerably speeds up the calculations: we're within a factor of 2 - 3 compared to R's base implementation $\Gamma$ function (you may argue that this is cheating because cumprod probably has a C function behind - but then the R interpreter is written in C, so distinctions are a bit blurred here).

I think essentially you are paying heavily here for all the safety checks R has and has to have as it is tailored to interactive use. See "Why does Python code run faster in a function?" for somewhat related issues in Python.

Take home message 1: both recursion and explict loops in R are a sensible option only if the computation in each function call/inside the loop is complicated enough so the overhead doesn't matter.

Take home message 2: Knowing your math can help tremendously: R's factorial has constant run time (about 1.8 μs on my laptop):

microbenchmarking factorial vs. cumprod

Take home message 3: However, does that speed-up matter at all? For factorials, probably not: the graph goes over the whole range of x where the result can be held by a double. The computations by both functions do not take more than ca. 5 μs. Even your "worst" function does it in 500 μs. If you had large numbers of factorials to calculate, you'd use a look-up table: a vector of 170 elements isn't that big. factorialCumprod calculates the whole thing within 5 μs for you.
If you happen to need factorials of larger numbers for your calculations, probably you should try hard to reformulate the problem - I'd anyways expect that numeric trouble is right behind the corner (even if you can use big integers - in R there are packages gmp and Rmpfr)

PS: in case you are wondering about Fibonacci numbers which cannot be replaced by a similarly convenient cumprod or cumsum call, see these blog posts about the recursive and iterative calculation (The worst algorithm in the world?) and the closed form calculation (Computing Fibonacci numbers using Binet’s formula)

The call to factorial(n - 1) in factorialRecursive() in R was a typo! I didn't catch it because the function was behaving as expected. I did wonder why it was so much more faster, but JS times caught my attention more. — NPN328, Oct 07 '13 at 19:26

score 9 · Answer 2 · edited May 23 '17 at 10:31

9

I believe that the main difference is that Python have bignums while Javascript does not (it uses double IEEE754 floating point).

So your program don't compute the same things. With Python, they compute all the digits of factorial, with JS only a crude floating point approximation with a mantissa of about 15 digits.

To be fair, you need to find out and use a bignum library for JS. See this question.

edited May 23 '17 at 10:31

Community

1
1

answered Oct 07 '13 at 04:38

Basile Starynkevitch

223,805
18
296
547

3

Another way to be fair is to change the Python code to use floats (similarly for R). Modern JavaScript really is quite fast ;-) – Tim Peters Oct 07 '13 at 04:43
Yeah, JS can only accurately use integers upto 2^53 which is insufficient for 150!. – user568109 Oct 07 '13 at 04:47
4

Answer is **completely wrong** (at least for R). R is not using bignums in this code. If it was then factorialIterF(157)+1 == factorialIterF(157) would be FALSE. – Spacedman Oct 07 '13 at 08:20
2

I removed mention of R – Basile Starynkevitch Oct 07 '13 at 09:25

Why simple factorial algorithms in JS are much faster than in Python or R?

2 Answers2