List indexing efficiency (python 2 vs python 3)

Question

In answering another question, I suggested to use timeit to test the difference between indexing a list with positive integers vs. negative integers. Here's the code:

import timeit
t=timeit.timeit('mylist[99]',setup='mylist=list(range(100))',number=10000000)
print (t)
t=timeit.timeit('mylist[-1]',setup='mylist=list(range(100))',number=10000000)
print (t)

I ran this code with python 2.6:

$ python2.6 test.py
0.587687015533
0.586369991302

Then I ran it with python 3.2:

$ python3.2 test.py
0.9212150573730469
1.0225799083709717

Then I scratched my head, did a little google searching and decided to post these observations here.

Operating system: OS-X (10.5.8) -- Intel Core2Duo

That seems like a pretty significant difference to me (a factor of over 1.5 difference). Does anyone have an idea why python3 is so much slower -- especially for such a common operation?

EDIT

I've run the same code on my Ubuntu Linux desktop (Intel i7) and achieved comparable results with python2.6 and python 3.2. It seems that this is an issue which is operating system (or processor) dependent (Other users are seeing the same behavior on Linux machines -- See comments).

EDIT 2

The startup banner was requested in one of the answers, so here goes:

Python 2.6.4 (r264:75821M, Oct 27 2009, 19:48:32) 
[GCC 4.0.1 (Apple Inc. build 5493)] on darwin

and:

Python 3.2 (r32:88452, Feb 20 2011, 10:19:59) 
[GCC 4.0.1 (Apple Inc. build 5493)] on darwin

UPDATE

I've just installed fresh versions of python2.7.3 and python3.2.3 from http://www.python.org/download/

In both cases, I took the

"Python x.x.3 Mac OS X 32-bit i386/PPC Installer (for Mac OS X 10.3 through 10.6 [2])"

since I am on OS X 10.5. Here are the new timings (which are reasonably consistent through multiple trials):

python 2.7

$python2.7 test.py
0.577006101608
0.590042829514

python 3.2.3

$python3.2 test.py
0.8882801532745361
1.034242868423462

In 2.7, on my Mac, I get 0.535/0.528 with range, 0.527/0.504 with xrange, so I don't think that's the difference. — abarnert, Jul 09 '12 at 18:09
@AshwiniChaudhary -- That stuff is all in `setup` which doesn't get timed. `xrange` vs `range` should have nothing to do with it. — mgilson, Jul 09 '12 at 18:09
In 3.2, on my Mac, I get 0.381/0.439, which is actually faster than 2.7. But I also get variation as much as 75% between subsequent runs, which implies that you're not running nearly enough reps. — abarnert, Jul 09 '12 at 18:10
Try running the same test several times in a row, looking for the fastest time. One second of runtime is short enough that background operations on your computer are likely to disturb the timing. — Russell Borogove, Jul 09 '12 at 18:16
@abarnert: by the time `mylist[99]`/`mylist[-1]` is invoked, `mylist` is just a list of numbers, `[0, 1, ..., 99]`. Both `range` and `xrange` should produce the same input to the `stmt` call, which is what's actually timed. — DSM, Jul 09 '12 at 18:17
@abarnert -- interesting point. Hadn't thought of that. I don't get the variation you're describing. I've ran it 10-15 times here without any significant variation in the results. Interestingly enough, I just ran it on my Linux desktop and the problem goes away. — mgilson, Jul 09 '12 at 18:18
For reference: there is [this](http://stackoverflow.com/a/5915075/92493) answer discussing the lack of special-casing of small integers in python3.0, maybe it's still missing in python3.2 — Otto Allmendinger, Jul 09 '12 at 18:19
@abarnert: it make look silly but it's necessary to get Python 2/Python 3 compatibility.. — DSM, Jul 09 '12 at 18:20
@OttoAllmendinger -- This is consistent with some of the answers there (much bigger difference on OS-X vs. Linux). You might be right. — mgilson, Jul 09 '12 at 18:24
@mgilson I have roughly and consistently the same results as you on my Linux machine — Otto Allmendinger, Jul 09 '12 at 18:25
@OttoAllmendinger -- What processor do you have? I'm testing an i7 (linux) vs Core2Duo (OS-X) — mgilson, Jul 09 '12 at 18:32
You still haven't told us where you got your Python 3.2 from. Did you install a binary from somewhere, build it with MacPorts or Homebrew, build the source yourself, or what? (And if your 2.6 isn't the stock Apple build, the same question applies there.) — abarnert, Jul 09 '12 at 18:34
@abarnert -- Honestly, I don't remember where I got them. It was probably a couple years ago. I probably downloaded binaries from python website. (I've built it from source on a few machines, but based on the compiler info, it's extremely unlikely that I built either of these versions myself) — mgilson, Jul 09 '12 at 18:40

abarnert · Accepted Answer · 2012-07-09T21:21:16.170

24

This appears to be an artifact of some builds of Python 3.2. The best hypothesis at this point is that all 32-bit Intel builds have the slowdown, but no 64-bit ones do. Read on for further details.

You didn't run nearly enough tests to determine anything. Repeating your test a bunch of times, I got values ranging from 0.31 to 0.54 for the same test, which is a huge variation.

So, I ran your test with 10x the number, and repeat=10, using a bunch of different Python2 and Python3 installs. Throwing away the top and bottom results, averaging the other 8, and dividing by 10 (to get a number equivalent to your tests), here's what I saw:

 1. 0.52/0.53 Lion 2.6
 2. 0.49/0.50 Lion 2.7
 3. 0.48/0.48 MacPorts 2.7
 4. 0.39/0.49 MacPorts 3.2
 5. 0.39/0.48 HomeBrew 3.2

So, it looks like 3.2 is actually slightly faster with [99], and about the same speed with [-1].

However, on a 10.5 machine, I got these results:

 1. 0.98/1.02 MacPorts 2.6
 2. 1.47/1.59 MacPorts 3.2

Back on the original (Lion) machine, I ran in 32-bit mode, and got this:

 1. 0.50/0.48 Homebrew 2.7
 2. 0.75/0.82 Homebrew 3.2

So, it seems like 32-bitness is what matters, and not Leopard vs. Lion, gcc 4.0 vs. gcc 4.2 or clang, hardware differences, etc. It would help to test 64-bit builds under Leopard, with different compilers, etc., but unfortunately my Leopard box is a first-gen Intel Mini (with a 32-bit Core Solo CPU), so I can't do that test.

As further circumstantial evidence, I ran a whole slew of other quick tests on the Lion box, and it looks like 32-bit 3.2 is ~50% slower than 2.x, while 64-bit 3.2 is maybe a little faster than 2.x. But if we really want to back that up, someone needs to pick and run a real benchmark suite.

Anyway, my best guess at this point is that when optimizing the 3.x branch, nobody put much effort into 32-bit i386 Mac builds. Which is actually a reasonable choice for them to have made.

Or, alternatively, they didn't even put much effort into 32-bit i386 period. That possibility might explain why the OP saw 2.x and 3.2 giving similar results on a linux box, while Otto Allmendinger saw 3.2 being similarly slower to 2.6 on a linux box. But since neither of them mentioned whether they were running 32-bit or 64-bit linux, it's hard to know whether that's relevant.

There are still lots of other different possibilities that we haven't ruled out, but this seems like the best one.

edited Jul 09 '12 at 21:21

answered Jul 09 '12 at 18:24

abarnert

354,177
51
601
671

Both should be 32 bit. OS-X 10.5.8 is a 32 bit OS, so I couldn't run a 64 bit version if I wanted to. – mgilson Jul 09 '12 at 18:26
Leopard can run 64-bit apps. Unlike Snow Leopard, at least on Intel, it has a 32-bit-only kernel, and most of the built-in apps are 32-bit, and the stock gcc defaults to 32-bit, but it can still build and run 64-bit apps. – abarnert Jul 09 '12 at 18:30
1

First, this is a great answer (+1) although I'm not 100% convinced this is the reason. I've posted the startup banner and processor for my two machines. As far as I can tell, it looks like I got both python installs from the same place, compiled with the same compiler. If someone else can reproduce your test using OS-X 10.5.8, I'd probably be convinced. (if nothing else comes along, I'll probably accept this answer anyway -- It is a great answer). – mgilson Jul 09 '12 at 18:42
1

I'm installing the current MacPorts 2.6 and 3.2 on a 10.5 machine, which is going to take forever… but an hour after forever, I'll run the test there and see what happens. If it doesn't repro your results, that should conclusively prove there's something weird about your 3.2 build, not about 3.2 in general. If it does, there's room for more investigation. – abarnert Jul 09 '12 at 18:50
I just reinstalled the binaries from http://www.python.org/download/ In both cases (now python 2.7.3 vs 3.2.3), I took the 32 bit, i386/PPC version. The problem is still there, although the timings are now slightly closer. I'll update my post. – mgilson Jul 09 '12 at 18:57
I'm also skeptical. I'm using stock Arch Linux versions of python 2.7.3 and 3.2.3 and get very similiar results to @mgilson – Otto Allmendinger Jul 09 '12 at 18:59
@OttoAllmendinger: But mgilson _didn't_ get similar results on his linux box; he saw 2.6 and 3.2 had comparable speeds. So your results contradict his, rather than backing them up. (If you're using 32-bit Python while he's using 64-bit, that would tell us a lot.) – abarnert Jul 09 '12 at 20:04
This is great! I am running 64 bit on my linux box. I think that I believe your diagnosis (@OttoAllmendinger could put the nail in the coffin if he's using a 32 bit implementation http://stackoverflow.com/q/1405913/748858 ). It still doesn't answer **why** it is slower though other than a hand-waving "they didn't bother to optimize for 32 bit". – mgilson Jul 09 '12 at 20:23
1

OK, true; if you want to know _why_ it's slower you'd have to look at the source code and/or profile the interpreter and/or skim the archives of the python-dev list, all of which are more effort than I'm willing to put in at the moment… – abarnert Jul 09 '12 at 20:47

subdir · Answer 2 · 2012-07-09T20:32:01.263

4

here is a code that illustrates at least part of the answer:

$ python
Python 2.7.3 (default, Apr 20 2012, 22:44:07) 
[GCC 4.6.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import timeit
>>> t=timeit.timeit('mylist[99]',setup='mylist=list(range(100))',number=50000000)
>>> print (t)
2.55517697334
>>> t=timeit.timeit('mylist[99L]',setup='mylist=list(range(100))',number=50000000)
>>> print (t)
3.89904499054

$ python3
Python 3.2.3 (default, May  3 2012, 15:54:42) 
[GCC 4.6.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import timeit
>>> t=timeit.timeit('mylist[99]',setup='mylist=list(range(100))',number=50000000)
>>> print (t)
3.9906489849090576

python3 does not have old int type.

edited Jul 09 '12 at 20:32

answered Jul 09 '12 at 20:14

subdir

748
5
6

Very interesting. Using long integers gives timing results **very** similar to the python 3.x version. – mgilson Jul 09 '12 at 20:24
1

Nice. A quick bit of searching showed a post about speeding up long in Python 3.1 because it's so much slower than 2.x int, but unfortunately the site itself isn't reachable. More diligent searching would probably turn up exactly what they did to fix things, and maybe why it didn't help as much for 32-bit. But I can imagine lots of possibilities. For example, with 64-bit words, tricks where small longs fit into one word wouldn't require reading a char at a time… – abarnert Jul 09 '12 at 21:27

pepr · Answer 3 · 2012-07-09T22:29:58.840

0

Python 3 range() is the Python 2 xrange(). If you want to simulate the Python 2 range() in Python 3 code, you have to use list(range(num). The bigger the num is, the bigger difference will be observed with your original code.

Indexing should be independent on what is stored inside the list as the list stores only references to the target objects. The references are untyped and all of the same kind. The list type is therefore a homogeneous data structure -- technically. Indexing means to turn the index value into the start address + offset. Calculating the offset is very efficient with at most one subtraction. This is very cheap extra operation when compared with the other operations.

edited Jul 09 '12 at 22:29

answered Jul 09 '12 at 22:24

pepr

20,112
15
76
139

6

*sigh* -- we've already been through this. the `setup` portion in timeit doesn't get timed. In my setup, I do use `list(range(num))`, so the stuff that is being timed (in both cases) is a list of numbers, not a range object. And I understand the stuff in your second paragraph too. The question is why is it so much slower to do that mentioned operations using python3.x compared to python 2.x? – mgilson Jul 09 '12 at 22:34
@mgilson; +1, you are right. I should read better next time ;) – pepr Jul 10 '12 at 07:43

List indexing efficiency (python 2 vs python 3)

3 Answers3

Linked