2

Please help me solve a benchmark question about Elixir vs. Ruby performance.

I tried to implement the same factorial in both languages, and Ruby shows better results than Elixir:

# ruby_factorial_with_iterator.rb
def factorial_with_iterator(n)
  res = 1
  (1..n).each{|time| res *= time}
  res
end

p "factorial_with_iterator(200000)"
p factorial_with_iterator(200000)

After run:

$ time ruby ruby_factorial_with_iterator.rb
real  0m18.378s
user  0m17.348s
sys   0m0.844s

and two Elixir examples:

# elixir_factorial_with_iterator.exs
defmodule FactorialWithIterator do
  def of(n) do
    Enum.reduce(1..n, 1, &*/2)
  end
end

IO.puts "Factorial of 200000: "
IO.puts FactorialWithIterator.of(200000)

After run:

$ time elixir elixir_factorial_with_iterator.exs
real  1m1.735s
user  1m1.556s
sys   0m0.104s

Another example:

# elixir_factorial_with_recursion.exs
defmodule FactorialWithRecursion do
  def of(0), do: 1
  def of(n) when n > 0 do
    n * of(n - 1)
  end
end

IO.puts "Factorial of 200000: "
IO.puts FactorialWithRecursion.of(200000)

After run:

$ time elixir elixir_factorial_with_recursion.exs
real  1m7.149s
user  1m6.248s
sys   0m0.092s

Why is there such a huge difference: Elixir - 1m1s, and Ruby - just 18s? Or how to write correct iteration code in the Elixir?

P.S. Environment:

  • Ubuntu 16.04
  • ruby 2.3.1p112 (2016-04-26 revision 54768) [x86_64-linux]
  • Erlang/OTP 19 [erts-8.3] [source-d5c06c6] [64-bit] [smp:4:4] [async-threads:10] [hipe] [kernel-poll:false]
  • Elixir 1.4.4
  • 7
    Number crunching [is not one of Erlang's strengths](https://stackoverflow.com/q/1308527/125816). Concurrency/parallelism is. – Sergio Tulentsev Jun 08 '17 at 21:09
  • This sort of question won't go anywhere because it's too broad. You might find information why on [programmers.se]. – the Tin Man Jun 08 '17 at 21:23
  • 1
    The simple answer is that benchmarking is very, very hard. I encourage everybody to read this mail by one of the maintainers of the JMH JVM benchmark harness: https://groups.google.com/d/msg/mechanical-sympathy/m4opvy4xq3U/7lY8x8SvHgwJ Yes, it is about JMH, and yes, it is about JVM, but it actually applies to all benchmark harnesses on all modern highly-optimizing language implementations. You really need an advanced understanding of modern high-performance dynamic optimizing compilers, statistics, hardware architecture, and lots of other things to write meaningful benchmarks. – Jörg W Mittag Jun 08 '17 at 22:15
  • 2
    Not to mention that the code isn't even doing the same thing. The Ruby version is an imperative, side-effecting, impure, stateful loop. The Elixir versions are a fold and a recursive function, those are three very different algorithms. – Jörg W Mittag Jun 09 '17 at 01:26
  • 3
    Interestingly enough, when I tried parallelizing it, the vast majority of CPU time is spent in the final couple of multiplies, where a handful of really big numbers get multiplied. It seems that it's not Elixir per se, but rather Erlang's bignum library is lacking for very large numbers. – cdegroot Jun 09 '17 at 14:06
  • 1
    Using time also means you are timing the startup time of the VM as well. – Fred the Magic Wonder Dog Jun 09 '17 at 16:18
  • 4
    Elixir is designed to be a compiled language and you are using it's "scripting" feature here (see .exs extensions) which is not its main purpose. Elixir is optimized for maximum speed on the compiled code in the already running Erland VM, not running scripts. Ruby is a scripting language and optimized for that use case. – fjahr Apr 26 '18 at 13:57

1 Answers1

14

As mentioned in one of the comments, you're using time, which also times the launching of the VM and, in elixir's case, the compiling of the code to BEAM bytecode. To avoid counting all that, you should use benchmarking tools in the languages themselves.

I was curious, so I tried benchmarking these function myself.

I used:

Ruby:

require 'benchmark/ips'

def factorial_with_iterator(n)
  res = 1
  (1..n).each{|time| res *= time}
  res
end

Benchmark.ips do |x|
  x.config(time: 5, warmup: 2)
  x.report('factorial_with_iterator.rb') do
    factorial_with_iterator(200000)
  end
  x.compare!
end

Elixir:

defmodule Factorial do
  def iter(n) do
    Enum.reduce(1..n, 1, &*/2)
  end

  def recur(0), do: 1

  def recur(n) when n > 0 do
    n * recur(n - 1)
  end
end

Benchee.run(%{
  "factorial_with_iter.ex" => fn -> Factorial.iter(200000) end,
  "factorial_with_recur.ex" => fn -> Factorial.recur(200000) end
})

I got these results:

Ruby:

Warming up --------------------------------------
factorial_with_iterator.rb
                         1.000  i/100ms
Calculating -------------------------------------
factorial_with_iterator.rb
                          0.033  (± 0.0%) i/s -      1.000  in  29.994713s

Elixir:

Name                              ips        average  deviation         median         99th %
factorial_with_iter.ex         0.0395        25.29 s     ±0.00%        25.29 s        25.29 s
factorial_with_recur.ex        0.0368        27.17 s     ±0.00%        27.17 s        27.17 s

Comparison:
factorial_with_iter.ex         0.0395
factorial_with_recur.ex        0.0368 - 1.07x slower

So these results show Elixir being slightly faster with both implementations with Ruby taking ~30 seconds and Elixir taking ~25 and ~27 seconds.

Using "iterations per second" though, for function which take much longer than a second might be a little "wrong". So I also tried with a much lower input. Instead of 200_000, I used 1_000, and got these results:

Ruby:

Warming up --------------------------------------
factorial_with_iterator.rb
                       169.000  i/100ms
Calculating -------------------------------------
factorial_with_iterator.rb
                          1.750k (± 8.0%) i/s -      8.788k in   5.064619s

Elixir:

Name                              ips        average  deviation         median         99th %
factorial_with_recur.ex        3.15 K      317.36 μs    ±12.72%         306 μs      481.87 μs
factorial_with_iter.ex         3.02 K      331.13 μs    ±16.83%         316 μs         559 μs

Comparison:
factorial_with_recur.ex        3.15 K
factorial_with_iter.ex         3.02 K - 1.04x slower

Curiously, this shows Elixir being much faster than Ruby. Elixir was able to perform over 3k iteration per second for both implementations, where Ruby was only able to perform 1.75k iteration per second.

Using:

  • ruby 2.4.1p111 (2017-03-22 revision 58053) [x86_64-darwin16]
  • Elixir 1.6.5 (compiled with OTP 19) running on Erlang/OTP 21 [erts-10.0] [source] [64-bit] [smp:8:8] [ds:8:8:10] [async-threads:1] [hipe] [dtrace]
doughsay
  • 326
  • 3
  • 11
  • 1
    Different BigInteger implementations presumably scale differently with larger numbers. In the Fac(1k) case, more of the time is spent in much smaller multiplies. Fac(200k) is so big that you might even be getting into L1d cache effects for the last few multiplies. – Peter Cordes Jun 23 '18 at 01:03
  • 2
    Just checked, Fac(200000) takes ~394kiB. The inputs are each about half that size, of course. So not just L1d misses but L2 misses as well in typical CPUs with 256k L2. IDK if different implementations might have different access patterns. – Peter Cordes Jun 23 '18 at 01:34