10

Note: I'm a Ruby developer trying to find my way in Python.

When I wanted to figure out why some scripts use mylist[:] instead of list(mylist) to duplicate lists, I made a quick benchmark of the various methods to duplicate range(10) (see code below).

EDIT: I updated the tests to make use of Python's timeit as suggested below. This makes it impossible to directly compare it to Ruby, because timeit doesn't account for the looping while Ruby's Benchmark does, so Ruby code is for reference only.

Python 2.7.2

Array duplicating. Tests run 50000000 times
list(a)     18.7599430084
copy(a)     59.1787488461
a[:]         9.58828091621
a[0:len(a)] 14.9832749367

For reference, I wrote the same script in Ruby too:

Ruby 1.9.2p0

Array duplicating. Tests 50000000 times
                      user     system      total        real
Array.new(a)     14.590000   0.030000  14.620000 ( 14.693033)
Array[*a]        18.840000   0.060000  18.900000 ( 19.156352)
a.take(a.size)    8.780000   0.020000   8.800000 (  8.805700)
a.clone          16.310000   0.040000  16.350000 ( 16.384711)
a[0,a.size]       8.950000   0.020000   8.970000 (  8.990514)

Question 1: what is mylist[:] doing differently that it is 25 % faster than even mylist[0:len(mylist)]. Does it copy in memory directly or what?

Question 2: edit: updated benchmarks don't show huge differences in Python and Ruby anymore. was: Did I implement the tests in some obviously inefficient way, so that Ruby code is so much faster than Python?

Now the code listings:

Python:

import timeit

COUNT = 50000000

print "Array duplicating. Tests run", COUNT, "times"

setup = 'a = range(10); import copy'

print "list(a)\t\t", timeit.timeit(stmt='list(a)', setup=setup, number=COUNT)
print "copy(a)\t\t", timeit.timeit(stmt='copy.copy(a)', setup=setup, number=COUNT)
print "a[:]\t\t", timeit.timeit(stmt='a[:]', setup=setup, number=COUNT)
print "a[0:len(a)]\t", timeit.timeit(stmt='a[0:len(a)]', setup=setup, number=COUNT)

Ruby:

require 'benchmark'

a = (0...10).to_a

COUNT = 50_000_000

puts "Array duplicating. Tests #{COUNT} times"

Benchmark.bm(16) do |x|
  x.report("Array.new(a)")   {COUNT.times{ Array.new(a) }}
  x.report("Array[*a]")   {COUNT.times{ Array[*a] }}
  x.report("a.take(a.size)")   {COUNT.times{ a.take(a.size) }}
  x.report("a.clone")    {COUNT.times{ a.clone }}
  x.report("a[0,a.size]"){COUNT.times{ a[0,a.size] }}
end
Laas
  • 5,978
  • 33
  • 52
  • 3
    Use the python [`timeit` module](http://docs.python.org/library/timeit.html) to measure python execution times. I doubt it'll make things (much) faster but it'll avoid all the usual timing traps. – Martijn Pieters Oct 24 '12 at 11:06
  • As for the time difference in `alist[:]` versus `alist[0:len(alist)]`; the latter creates python `int` objects, something the former method doesn't need to deal with. – Martijn Pieters Oct 24 '12 at 11:08
  • 1
    @MartijnPieters -- The latter also needs to look up the global `len` (and call it) each time – mgilson Oct 24 '12 at 11:13
  • 1
    **`Array(a)` does not duplicate an array**. When given an array it just calls `to_ary` on it, which returns `self`. You should also use [Ruby's Benchmark library](http://www.ruby-doc.org/stdlib-1.9.3/libdoc/benchmark/rdoc/Benchmark.html) instead of doing your timing manually. – Andrew Marshall Oct 24 '12 at 11:59
  • Of course. I must've mixed it up in my head with `Array.new(a)` because I was sure it created new array. Updated code. – Laas Oct 24 '12 at 12:33
  • 1
    Try `obj.dup` in Ruby and benchmark too. – SwiftMango Oct 24 '12 at 13:51
  • tested: `a.dup` wasn't much different from `a.clone` in timing wise – Laas Oct 24 '12 at 14:20
  • If anyone's curious, I've done [some benchmarks in Python 3.6.0](http://stackoverflow.com/a/43220129/3745896) – River Apr 05 '17 at 01:18

2 Answers2

9

Use the timeit module in python for testing timings.

from copy import *

a=range(1000)

def cop():
    b=copy(a)

def func1():
    b=list(a)

def slice():
    b=a[:]

def slice_len():
    b=a[0:len(a)]



if __name__=="__main__":
    import timeit
    print "copy(a)",timeit.timeit("cop()", setup="from __main__ import cop")
    print "list(a)",timeit.timeit("func1()", setup="from __main__ import func1")
    print "a[:]",timeit.timeit("slice()", setup="from __main__ import slice")
    print "a[0:len(a)]",timeit.timeit("slice_len()", setup="from __main__ import slice_len")

Results:

copy(a) 3.98940896988
list(a) 2.54542589188
a[:] 1.96630120277                   #winner
a[0:len(a)] 10.5431251526

It's surely the extra steps involved in a[0:len(a)] are the reason for it's slowness.

Here's the byte code comparison of the two:

In [19]: dis.dis(func1)
  2           0 LOAD_GLOBAL              0 (range)
              3 LOAD_CONST               1 (100000)
              6 CALL_FUNCTION            1
              9 STORE_FAST               0 (a)

  3          12 LOAD_FAST                0 (a)
             15 SLICE+0             
             16 STORE_FAST               1 (b)
             19 LOAD_CONST               0 (None)
             22 RETURN_VALUE        

In [20]: dis.dis(func2)
  2           0 LOAD_GLOBAL              0 (range)
              3 LOAD_CONST               1 (100000)
              6 CALL_FUNCTION            1
              9 STORE_FAST               0 (a)

  3          12 LOAD_FAST                0 (a)    #same up to here
             15 LOAD_CONST               2 (0)    #loads 0
             18 LOAD_GLOBAL              1 (len) # loads the builtin len(),
                                                 # so it might take some lookup time
             21 LOAD_FAST                0 (a)
             24 CALL_FUNCTION            1         
             27 SLICE+3             
             28 STORE_FAST               1 (b)
             31 LOAD_CONST               0 (None)
             34 RETURN_VALUE        
Ashwini Chaudhary
  • 244,495
  • 58
  • 464
  • 504
  • This surely answers my questions and shows that there are multiple ways a n00b can write inefficient code - even in the latter timeit variant my `copy` was way slower than your implementation. Thanks! ;-) – Laas Oct 24 '12 at 12:10
  • @Laas glad that helped :), and which one of these came out to be the fastest one in your system? – Ashwini Chaudhary Oct 24 '12 at 12:16
  • @Laas you're right `copy() is not the fastest one, I gad a mistake in my code(forgot to call `cop` function in timeit) – Ashwini Chaudhary Oct 24 '12 at 12:24
  • Yes, I got `a[:]` as the winner too. I updated the question with my timings. – Laas Oct 24 '12 at 12:43
6

I can't comment on the ruby timing vs. the python timing. But I can comment on list vs. slice. Here's a quick inspection of the bytecode:

>>> import dis
>>> a = range(10)
>>> def func(a):
...     return a[:]
... 
>>> def func2(a):
...     return list(a)
... 
>>> dis.dis(func)
  2           0 LOAD_FAST                0 (a)
              3 SLICE+0             
              4 RETURN_VALUE        
>>> dis.dis(func2)
  2           0 LOAD_GLOBAL              0 (list)
              3 LOAD_FAST                0 (a)
              6 CALL_FUNCTION            1
              9 RETURN_VALUE 

Notice that list requires a LOAD_GLOBAL to find the function list. Looking up globals (and calling functions) in python is relatively slow. This would explain why a[0:len(a)] is also slower. Also remember that list needs to be able to handle arbitrary iterators whereas slicing doesn't. This means that list needs to allocate a new list, pack elements into that list as it iterates over the list and resize when necessary. There are a few things in here which are expensive -- resizing if necessary and iterating (effectively in python, not C). With the slicing method, you can calculate the size of the memory you'll need so can probably avoid resizing, and the iteration can be done completely in C (probably with a memcpy or something.

disclaimer : I'm not a python dev, so I don't know how the internals of list() are implemented for sure. I'm just speculating based what I know of the specification.

EDIT -- So I've looked at the source (with a little guidance from Martijn). The relevant code is in listobject.c. list calls list_init which then calls listextend at line 799. That function has some checks to see if it can use a fast branch if the object is a list or a tuple (line 812). Finally, the heavy lifting is done starting at line 834:

 src = PySequence_Fast_ITEMS(b);
 dest = self->ob_item + m;
 for (i = 0; i < n; i++) {
     PyObject *o = src[i];
     Py_INCREF(o);
     dest[i] = o;
 }

Compare that to the slice version which I think is defined in list_subscript (line 2544). That calls list_slice (line 2570) where the heavy lifting is done by the following loop (line 486):

 src = a->ob_item + ilow;
 dest = np->ob_item;
 for (i = 0; i < len; i++) {
     PyObject *v = src[i];
     Py_INCREF(v);
     dest[i] = v;
 }

They're pretty much the same code, so it's not surprising that the performance is almost the same for large lists (where the overhead of the small stuff like unpacking slices, looking up global variables, etc becomes less important)


Here's how I would run the python tests (and the results for my Ubuntu system):

$ python -m timeit -s 'a=range(30)' 'list(a)'
1000000 loops, best of 3: 0.39 usec per loop
$ python -m timeit -s 'a=range(30)' 'a[:]'
10000000 loops, best of 3: 0.183 usec per loop
$ python -m timeit -s 'a=range(30)' 'a[0:len(a)]'
1000000 loops, best of 3: 0.254 usec per loop
mgilson
  • 300,191
  • 65
  • 633
  • 696
  • Weird, but `list(a)` comes out to be the fastest one on my system. – Ashwini Chaudhary Oct 24 '12 at 11:24
  • @AshwiniChaudhary -- what system? That is odd. It's slowest for me on my OS-X system and my Ubuntu linux system. – mgilson Oct 24 '12 at 11:34
  • @AshwiniChaudhary: your lookup is local, vs. global in mgilson's case. If mgilson added `len = __builtins__.len` in his function it'd be faster too, I bet. – Martijn Pieters Oct 24 '12 at 11:43
  • @MartijnPieters -- How is Ashwini's lookup local? Just because `a` is local that doesn't make `len` local ... (Though you're right that local variables are looked up faster than globals...). I suspect that the problem with his test is that `range` is included in the timing. This could make his results a lot more sensitive to various system fluctuations ... – mgilson Oct 24 '12 at 11:54
  • @mgilson: it's in the iPython shell, so `locals() is globals()` is `True`. And `range()` is not included in his timings, only in the disassembly examples. – Martijn Pieters Oct 24 '12 at 11:56
  • @halex timed them again using simple `timeit` procedure(see the updated solution) with `range(1000)` and `list()` was slower this time . and `copy()` was the winner. – Ashwini Chaudhary Oct 24 '12 at 11:58
  • @AshwiniChaudhary It's intersting that the more elements the list has the better is `list` doing compared with `[:]`. Has this to do with JIT compiling? – halex Oct 24 '12 at 12:00
  • @MartijnPieters -- Interesting. I suppose I don't know anything about the IPython shell ... – mgilson Oct 24 '12 at 12:03
  • @halex -- As far as I know, Cpython has no JIT. (`pypy`'s a different story -- in fact, that's the reason that it exists :). This problem is getting deeper and deeper. I like it. – mgilson Oct 24 '12 at 12:04
  • @MartijnPieters It's weird but `timeit` results on Ipython and using simple `timeit` vary too much. – Ashwini Chaudhary Oct 24 '12 at 12:06
  • @mgilson Thank you. I did not know that Cpython did not JIT. Thought it does because everyone today does :) I'm using Cpython 3.1 (and therefore change the `range` to `list(range)` to behave like python2) – halex Oct 24 '12 at 12:06
  • @mgilson: It applies to the regular python shell as well, *and* to modules. – Martijn Pieters Oct 24 '12 at 12:13
  • @AshwiniChaudhary: So, `list()` has a higher setup cost but copying the contents itself is faster for `list()` than for `alist[:]`. The cost of the setup is eventually offset by this difference given enough elements. – Martijn Pieters Oct 24 '12 at 12:18
  • @MartijnPieters -- Do you happen to know where `list` is defined in the C code? (finding `a[:]` was easy enough to track down in `listobject.c` -- I'd like to have a look at the implementation here and see what's going on ...) – mgilson Oct 24 '12 at 12:32
  • @mgilson: Same place; [`list_init`](http://hg.python.org/cpython/file/84cd07899baf/Objects/listobject.c#l2439) is the constructor. The constructor calls `.extend()` (C function `listextend`) with the argument passed in. – Martijn Pieters Oct 24 '12 at 12:46