Why is equivalent Python code so much slower

Question

can somebody explain why is the following trivial code (implementation of Euclid's algorithm to find greatest common denominator) about 3 times slower then equivalent code in Ruby ?

contents of iter_gcd.py:

from sys import argv,stderr

def gcd(m, n):
    if n > m:
        m, n = n, m
    while n != 0:
        rem = m % n
        m = n
        n = rem
    return m

# in Python3 code there is xrange replaced with range function
def main(a1, a2):
    comp = 0
    for j in xrange(a1, 1, -1):
        for i in xrange(1, a2):
            comp += gcd(i,j)

    print(comp)

if __name__ == '__main__':
    if len(argv) != 3:
        stderr.write('usage: {0:s} num1 num2\n'.format(argv[0]))
        exit(1)
    else:
        main(int(argv[1]), int(argv[2]))

contents of iter_gcd.rb:

def gcd(m, n)
    while n != 0
        rem = m % n
        m = n
        n = rem
    end
    return m
end

def main(a1, a2)
    comp = 0
    a1.downto 2 do
        |j|
        1.upto (a2 - 1) do
            |i|
            comp += gcd(i,j)
        end
    end
    puts comp
end

 if __FILE__ == $0
    if ARGV.length != 2
        $stderr.puts('usage: %s num1 num2' % $0)
        exit(1)
    else
        main(ARGV[0].to_i, ARGV[1].to_i)
    end
end

Execution times measurements:

$ time python iter_gcd.py 4000 3000
61356305

real    0m22.890s
user    0m22.867s
sys     0m0.006s

$ python -V
Python 2.6.4


$ time python3 iter_gcd.py 4000 3000
61356305

real    0m18.634s
user    0m18.615s
sys     0m0.009s

$ python3 -V
Python 3.1.2


$ time ruby iter_gcd.rb 4000 3000
61356305

real    0m7.619s
user    0m7.616s
sys     0m0.003s

$ ruby -v
ruby 1.9.2p0 (2010-08-18 revision 29036) [x86_64-linux]

Just curious why I got such results. I considered CPython to be faster in most cases then MRI and even the new Ruby 1.9 on YARV but this "microbenchmark" did really surprised me.

Btw, I know I can use specialised library function like fractions.gcd but I'd like to compare implementations of such basic and trivial language constructs.

Did I miss something or is the implementation of the next Ruby generation so much improved in area of sheer speed ?

python 2 has xrange too... I use it all the time. http://docs.python.org/library/functions.html#xrange — Frank V, Nov 29 '10 at 16:13
http://stackoverflow.com/questions/4068122/why-is-python-slower-compared-to-ruby-even-with-this-very-simple-test/4068176#4068176 — Nakilon, Nov 29 '10 at 16:13
For whatever it's worth, using your code above, python (2.6) is considerably faster than ruby (1.8) on my machine... 11.7 seconds for python vs. 46.7 seconds for ruby. Of course, I'm comparing to ruby 1.8 instead of 1.9, which wasn't quite your question... — Joe Kington, Nov 29 '10 at 16:25
Does Ruby handle arbitrary-size integers, like Python does? If not, this is one reason for Python to be slower than Ruby, here. — Eric O. Lebigot, Nov 29 '10 at 17:07
The Python `gcd` swaps `m` and `n` if `n>m`. The Ruby `gcd` doesn't. This probably doesn't account for all the difference, but it would be better to compare apples with apples. — unutbu, Nov 29 '10 at 17:47
@unutbu: the `gcd()` without `if n>m` is slightly *slower* on my machine. So it doesn't explain the difference. — jfs, Nov 29 '10 at 18:13
The reason it runs faster is 'cuz Euclid was totally into Ruby. :-) — the Tin Man, Nov 29 '10 at 20:25
You could write `n, m = n % m, n` instead of `rem = m % n; m = n; n = rem` in Python. The `if n > m` is unnecessary in `gcd()` if n, m are positive. http://stackoverflow.com/questions/4305518/why-is-equivalent-python-code-so-much-slower/4306668#4306668 — jfs, Nov 30 '10 at 19:22
btw, `fractions.gcd()` is pure Python and it is identical to the one from my answer. So it won't be faster. http://svn.python.org/view/python/trunk/Lib/fractions.py?view=markup — jfs, Dec 02 '10 at 23:02
I've run the `iter_gcd.py` with `python` executable configured with `--enable-profiling`. Here's `gprof` output: https://gist.github.com/734334 (total time is `24.8`s, `gprof` accounts only `12`s); call graphs for original iter_gcd.py, with gcd() replace by dummy function, with gcd() inlined correspondingly: http://imgur.com/O6dHTl&CB5a0&4VsRV — jfs, Dec 09 '10 at 04:39

score 37 · Accepted Answer · edited May 23 '17 at 12:24

Summary

"Because the function call overhead in Python is much larger than in Ruby."

Details

Being a microbenchmark, this really doesn't say much about the performance of either language in proper use. Likely you would want to rewrite the program to take advantage of the strengths of Python and Ruby, but this does illustrate one of the weak points of Python at the moment. The root cause of the speed differences come from function call overhead. I made a few tests to illustrate. See below for code and more details. For the Python tests, I used 2000 for both gcd parameters.

Interpreter: Python 2.6.6
Program type: gcd using function call
Total CPU time: 29.336 seconds

Interpreter: Python 2.6.6
Program type: gcd using inline code
Total CPU time: 13.194 seconds

Interpreter: Python 2.6.6
Program type: gcd using inline code, with dummy function call
Total CPU  time: 30.672 seconds

This tells us that it's not the calculation made by the gcd function that contributes most to the time difference, it's the function call itself. With Python 3.1, the difference is similar:

Interpreter: Python 3.1.3rc1
Program type: gcd using function call
Total CPU time: 30.920 seconds

Interpreter: Python 3.1.3rc1
Program type: gcd using inline code
Total CPU time: 15.185 seconds

Interpreter: Python 3.1.3rc1
Program type: gcd using inline code, with dummy function call
Total CPU time: 33.739 seconds

Again, the actual calculation is not biggest contributor, it's the function call itself. In Ruby, the function call overhead is much smaller. (Note: I had to use smaller parameters (200) for the Ruby version of the programs because the Ruby profiler really slows down real-time performance. That doesn't affect CPU time performance, though.)

Interpreter: ruby 1.9.2p0 (2010-08-18 revision 29036) [i486-linux]
Program type: gcd using function call
Total CPU time: 21.66 seconds

Interpreter: ruby 1.9.2p0 (2010-08-18 revision 29036) [i486-linux]
Program type: gcd using inline code
Total CPU time: 21.31 seconds

Interpreter: ruby 1.8.7 (2010-08-16 patchlevel 302) [i486-linux]
Program type: gcd using function call
Total CPU time: 27.00 seconds

Interpreter: ruby 1.8.7 (2010-08-16 patchlevel 302) [i486-linux]
Program type: gcd using inline code
Total CPU time: 24.83 seconds

Notice how neither Ruby 1.8 nor 1.9 suffer greatly from the gcd function call – the function call vs. inline version are more or less equal. Ruby 1.9 seems to be a little better with less difference between the function call and inline versions.

So the answer to the question is: "because the function call overhead in Python is much larger than in Ruby".

Code

# iter_gcd -- Python 2.x version, with gcd function call
#             Python 3.x version uses range instead of xrange
from sys import argv,stderr

def gcd(m, n):
    if n > m:
        m, n = n, m
    while n != 0:
        rem = m % n
        m = n
        n = rem
    return m

def main(a1, a2):
    comp = 0
    for j in xrange(a1, 1, -1):
        for i in xrange(1, a2):
            comp += gcd(i,j)
    print(comp)

if __name__ == '__main__':
    if len(argv) != 3:
        stderr.write('usage: {0:s} num1 num2\n'.format(argv[0]))
        exit(1)
    else:
        main(int(argv[1]), int(argv[2]))

# iter_gcd -- Python 2.x version, inline calculation
#             Python 3.x version uses range instead of xrange
from sys import argv,stderr

def main(a1, a2):
    comp = 0
    for j in xrange(a1, 1, -1):
        for i in xrange(1, a2):
            if i < j:
                m, n = j, i
            else:
                m, n = i, j
            while n != 0:
                rem = m % n
                m = n
                n = rem
            comp += m
    print(comp)

if __name__ == '__main__':
    if len(argv) != 3:
        stderr.write('usage: {0:s} num1 num2\n'.format(argv[0]))
        exit(1)
    else:
        main(int(argv[1]), int(argv[2]))

# iter_gcd -- Python 2.x version, inline calculation, dummy function call
#             Python 3.x version uses range instead of xrange
from sys import argv,stderr

def dummyfunc(n, m):
    a = n + m

def main(a1, a2):
    comp = 0
    for j in xrange(a1, 1, -1):
        for i in xrange(1, a2):
            if i < j:
                m, n = j, i
            else:
                m, n = i, j
            while n != 0:
                rem = m % n
                m = n
                n = rem
            comp += m
            dummyfunc(i, j)
    print(comp)

if __name__ == '__main__':
    if len(argv) != 3:
        stderr.write('usage: {0:s} num1 num2\n'.format(argv[0]))
        exit(1)
    else:
        main(int(argv[1]), int(argv[2]))

# iter_gcd -- Ruby version, with gcd function call

def gcd(m, n)
    if n > m
        m, n = n, m
    end
    while n != 0
        rem = m % n
        m = n
        n = rem
    end
    return m
end

def main(a1, a2)
    comp = 0
    a1.downto 2 do
        |j|
        1.upto a2-1 do
            |i|
            comp += gcd(i,j)
        end
    end
    puts comp
end

 if __FILE__ == $0
    if ARGV.length != 2
        $stderr.puts('usage: %s num1 num2' % $0)
        exit(1)
    else
        main(ARGV[0].to_i, ARGV[1].to_i)
    end
end

# iter_gcd -- Ruby version, with inline gcd

def main(a1, a2)
    comp = 0
    a1.downto 2 do |j|
        1.upto a2-1 do |i|
            m, n = i, j
            if n > m
                m, n = n, m
            end
            while n != 0
                rem = m % n
                m = n
                n = rem
            end
            comp += m
        end
    end
    puts comp
end

 if __FILE__ == $0
    if ARGV.length != 2
        $stderr.puts('usage: %s num1 num2' % $0)
        exit(1)
    else
        main(ARGV[0].to_i, ARGV[1].to_i)
    end
end

Test runs

Finally, the commands used to run Python and Ruby with profiling to get the numbers for comparison were pythonX.X -m cProfile iter_gcdX.py 2000 2000 for Python and rubyX.X -rprofile iter_gcdX.rb 200 200 for Ruby. The reason for the difference is that the Ruby profiler adds a lot of overhead. The results are still valid because I'm comparing the difference between a function call and inline code, not the difference between Python and Ruby as such.

The function call overhead *do not* dominates the execution time on my machine https://gist.github.com/733555 — jfs, Dec 08 '10 at 17:07
@J.F. Sebastian: I consistently get slightly better performance from the inline version. The mean over 20 runs differs by around .4 in favour of the inline version. But this just shows the problems of this kind of question: our Python versions could be different, run on different architectures, yours could be compiled with a different compiler, etc. I tried on another machine and results were in the same direction but the difference was larger. The fact remains that Python has quite a large function call overhead and that this question is impossible to answer without defining it more exactly. — Fabian Fagerholm, Dec 08 '10 at 21:44
@Fabian Fagerholm: I've added variant that only calls `dummyfunc()`. It takes `0.8` seconds vs. `4.5` for the inline variant https://gist.github.com/733555 It says that the calculation time is much larger than the time due to the call overhead in this case. What is the ratio `iter_gcd_dummy.py` time / `iter_gcd_inline.py` time on your machine? — jfs, Dec 09 '10 at 00:40
@J.F. Sebastian: The ratio is `0.65` vs `3.81` on a Xeon 2.83 GHz with Python 2.6.5 and `2.98` vs `10.65` on a Pentium M 1.2 GHz with Python 2.6.6. But this is completely obvious, since `iter_gcd_inline.py` runs an extra loop. It apples and oranges. A more proper comparison would be to inline the dummy function (ie. move the `dummyfunc()` addition into the main() function) and compare the "dummy function version" vs. "dummy inline version". In this case, I get the expected difference: moving the addition inline speeds up execution by 40-60%. — Fabian Fagerholm, Dec 09 '10 at 06:16
@Fabian Fagerholm: Your summary is plain wrong according to the results I've got on my machine: https://gist.github.com/735259 That is, the function call overhead in Python might be larger than in Ruby but it is not sufficient to explain the difference. Each file contains timings, see for yourself https://gist.github.com/733555 the difference is less than 10% between inline/not inlined versions. — jfs, Dec 09 '10 at 20:26

jfs · Answer 2 · 2010-12-09T20:14:55.363

22

I can confirm that ruby1.9 is faster than CPython for this "microbenchmark" on my machine:

| Interpreter                     | Time, s | Ratio |
|---------------------------------+---------+-------|
| python-2.6 (cython_gcd.gcd_int) |     2.8 |  0.33 |
| pypy-1.4                        |     3.5 |  0.41 |
| jython-2.5 (java "1.6.0_20")    |     4.7 |  0.55 |
| python-2.6 (cython_gcd.gcd)     |     5.6 |  0.65 |
| ruby-1.9                        |     8.6 |  1.00 |
| jython-2.5                      |     8.9 |  1.03 |
| python-3.2                      |    11.0 |  1.28 |
| python-2.6                      |    15.9 |  1.85 |
| ruby-1.8                        |    42.6 |  4.95 |
#+TBLFM: $3=$2/@6$2;%.2f

Profiler (python -mcProfile iter_gcd.py 4000 3000) shows that 80% of the time spent calling gcd() function, so indeed the difference is in the gcd() function.

I wrote cython_gcd extension for Python using Cython, cython_gcd.pyx:

def gcd(m, n):
    while n:
        n, m = m % n, n
    return m

def gcd_int(int m, int n):
    while n:
        n, m = m % n, n
    return m

It is used in the iter_gcd.py as follows from cython_gcd import gcd, gcd_int.

To try the extension, run: python setup.py build_ext --inplace, where setup.py:

from distutils.core import setup
from distutils.extension import Extension
from Cython.Distutils import build_ext

ext_modules = [Extension("cython_gcd", ["cython_gcd.pyx"])]

setup(
  name = 'Hello world app',
  cmdclass = {'build_ext': build_ext},
  ext_modules = ext_modules
)

To install the extension globally, run python setup.py install.

edited Dec 09 '10 at 20:14

answered Nov 29 '10 at 18:11

jfs

399,953
195
994
1,670

Nice answer, and cool example of cython. However, I think you're misinterpreting the benchmarks (or you were not exact with your words ;). Of course the program spends most of its time in the `gcd()` function – because it's called thousands of times even when the parameters are quite small. So the difference is not *in* the function, it's in the fact that the function is called. Maybe this is what you meant? – Fabian Fagerholm Dec 07 '10 at 21:14
@Fabian Fagerholm: if `gcd = lambda a,b:0` it takes 7% percent of the original time for the `gcd()` function. The fact that `gcd_int()` provides a much better performance could tell us by itself that the call overhead is small. – jfs Dec 08 '10 at 10:45
@J.F. Sebastian: Using cython definitely makes the loop faster, but you also reduce the function call overhead. Your lambda example shows that the calculation contributes greatly to the time used in the Python program, but it doesn't explain the difference between Python and Ruby. – Fabian Fagerholm Dec 08 '10 at 21:16
@Fabian Fagerholm: The lambda example unambiguously says that 93% of the time spent crunching the numbers. Imagine that the call overhead is exactly *zero* then the python version would be sligtly faster (~10%) in this case but still much slower than the ruby version. So yes the lambda example doesn't explain the difference between python and ruby; it explains that the call overhead *alone* **can not** explain the difference either. – jfs Dec 09 '10 at 00:21
1

@J.F. Sebastian: In my answer I showed that inlining the gcd calculation can make the Python version faster than the Ruby version. So something else than the calculation has to explain why Python is slower than Ruby in the gcd function case. I also showed that the difference between function gcd and inline gcd is much larger in Python than in Ruby, pointing to function call overhead in Python. But tests on different machines indicates that these findings vary considerably, so we're back to what I wrote in my answer: this doesn't say much. For an exact answer we'd need to profile much deeper. – Fabian Fagerholm Dec 09 '10 at 06:38

score 10 · Answer 3 · answered Nov 29 '10 at 16:20

10

I seem to remember that ruby handles integers differently than Python, so my guess would be it is simply that Python is spending a lot of time allocating memory while Ruby just mutates the integers in place.

For what it is worth, using Pypy 1.4 reduces the runtime for the Python version on my system from about 15 seconds to under 3 seconds.

answered Nov 29 '10 at 16:20

Duncan

92,073
11
122
156

I'd believe it has something to do with implementation of numbers storing/handling. On the other hand for above calculations sizes of integer and longint for the result are sufficient and I don't expect Python stores them ineffectively as arbitrary numbers, f.e. – David Unric Nov 29 '10 at 16:45
OT: I did have a look at Pypy and it looks promising, although faster then CPython not in all cases. Can we expect where it would be "production ready" ? – David Unric Nov 29 '10 at 16:47
@David Unric: It passes a wide variety of tests right now. If it's not good enough for you now, it probably will never be. – bukzor Nov 29 '10 at 18:08
it's not quite there yet http://codespeak.net/pypy/dist/pypy/doc/cpython_differences.html – dan_waterworth Nov 29 '10 at 19:40
2

http://pypy.org/compat.html this is the newest version of pypy compatibility. Some stuff like refcounting won't change – fijal Dec 02 '10 at 07:05

score 0 · Answer 4 · answered Dec 07 '10 at 18:59

I can't replicate your result. The python code appears to be 4 times faster than the ruby code:

2010-12-07 13:49:55:~/tmp$ time python  iter_gcd.py 4000 3000
61356305

real    0m14.655s
user    0m14.633s
sys 0m0.012s

2010-12-07 13:43:26:~/tmp$ time ruby iter_gcd.rb 4000 3000
iter_gcd.rb:14: warning: don't put space before argument parentheses
61356305

real    0m54.298s
user    0m53.955s
sys 0m0.028s

Versions:

2010-12-07 13:50:12:~/tmp$ ruby --version
ruby 1.8.7 (2010-06-23 patchlevel 299) [i686-linux]
2010-12-07 13:51:52:~/tmp$ python --version
Python 2.6.6

Also, the python code can be made 8% faster:

def gcd(m, n):
    if n > m:
        m, n = n, m
    while n:
        n, m = m % n, n
    return m

def main(a1, a2):
    print sum(
        gcd(i,j)
        for j in xrange(a1, 1, -1)
        for i in xrange(1, a2)
    )

if __name__ == '__main__':
    from sys import argv
    main(int(argv[1]), int(argv[2]))

Later: when I install and use ruby 1.9.1, the ruby code is way faster:

2010-12-07 14:01:08:~/tmp$ ruby1.9.1 --version
ruby 1.9.2p0 (2010-08-18 revision 29036) [i686-linux]
2010-12-07 14:01:30:~/tmp$ time ruby1.9.1 iter_gcd.rb 4000 3000
61356305

real    0m12.137s
user    0m12.037s
sys 0m0.020s

I think your question is really, "Why is ruby 1.9.x so much faster than ruby 1.8.x?"

Why is equivalent Python code so much slower

4 Answers4

Summary

Details

Code

Test runs

See also

Linked