8

I have this simple code that helped me to measure how classes with __slots__ perform (taken from here):

import timeit

def test_slots():
    class Obj(object):
        __slots__ = ('i', 'l')

        def __init__(self, i):
            self.i = i
            self.l = []

    for i in xrange(1000):
        Obj(i)

print timeit.Timer('test_slots()', 'from __main__ import test_slots').timeit(10000)

If I run it via python2.7 - I would get something around 6 seconds - ok, it's really faster (and also more memory-efficient) than without slots.

But, if I run the code under PyPy (using 2.2.1 - 64bit for Mac OS/X), it starts to use 100% CPU and "never" returns (waited for minutes - no result).

What is going on? Should I use __slots__ under PyPy?

Here's what happens if I pass different number to timeit():

timeit(10) - 0.067s
timeit(100) - 0.5s
timeit(1000) - 19.5s
timeit(10000) - ? (probably more than a Game of Thrones episode)

Thanks in advance.


Note that the same behavior is observed if I use namedtuples:

import collections
import timeit

def test_namedtuples():
    Obj = collections.namedtuple('Obj', 'i l')

    for i in xrange(1000):
      Obj(i, [])

print timeit.Timer('test_namedtuples()', 'from __main__ import test_namedtuples').timeit(10000)
Community
  • 1
  • 1
alecxe
  • 462,703
  • 120
  • 1,088
  • 1,195
  • 3
    Regardless of everything else, a program either looping infinitely or taking 60x as long as under CPython is a bug and should be raised with the PyPy guys. –  Apr 14 '14 at 19:03
  • I would actually expect the `__slots__` version to be slower as those classes are optimized for space and are not optimized for attribute retrieval. – wheaties Apr 14 '14 at 19:11
  • @wheaties yup, but the benchmark shows that `slots` and `namedtuples` are also faster: http://stackoverflow.com/questions/1336791/dictionary-vs-object-which-is-more-efficient-and-why/1336890#1336890 – alecxe Apr 14 '14 at 19:16
  • @delnan probably I should submit an issue for `PyPy` devs. I was hoping somebody know what is happening here. The intention was to find even better performance than python+slots - you know: slots are efficient, pypy is efficient - let's try them together :) – alecxe Apr 14 '14 at 19:20
  • 1
    @alexce As far as I understand, at least on CPython using lots means using a struct-like mechanism rather than a hash table. That means a LOT less calculation and less cache-thrashing too. Why shouldn't it be faster? – Adrian Ratnapala Apr 14 '14 at 19:39
  • @AdrianRatnapala well, I understand, this is what the benchmark against cpython shows, no questions about it. The question is about why `PyPy` is hanging on the benchmark. – alecxe Apr 14 '14 at 19:42
  • How does it perform on a single, or small number of iterations? – Marcin Apr 14 '14 at 20:04
  • @Marcin looks like the execution time growth is exponential: for `timeit(100)` it is `~0.5` seconds (5 times slower than cpython), for `timeit(1000)` it is `~19.5` seconds (>20 times slower than cpython). – alecxe Apr 14 '14 at 20:10
  • Out of curiosity, could you try it with the class defined at module-level instead of at function-level? Also, you have `range` instead of `xrange` in the `namedtuple` code. – kenm Apr 14 '14 at 20:25
  • 1
    @kwatford defining the class at the module level really helped - `pypy` with `timeit(10000)` finishes for `~0.45s`, wow effect :) – alecxe Apr 14 '14 at 20:31
  • 3
    @alecxe I'd guess that creating classes isn't exactly optimized by the jit, which you're asking it to do 10,000 times. Recreating it each run would also likely discard the optimizations it learned on each run. PyPy can be slow while it warms up, so try not to make it warm up repeatedly :) – kenm Apr 14 '14 at 20:39
  • @kwatford looks like you are right about the warm-up. You can add the comment as an answer and I would accept if there wouldn't be any other answers that explain the behavior in more detail. Thank you! – alecxe Apr 14 '14 at 20:42
  • A `namedtuple` is a `tuple` subclass with `__slots__ = ()` In [cpython](http://hg.python.org/cpython/file/4ff37fbcd4e8/Lib/collections/__init__.py#l263) and since it's implemented in pure python also in [pypy](https://bitbucket.org/pypy/pypy/src/d426723559fba31f23725b1d856a93b188b383ef/lib-python/2.7/collections.py?at=default#cl-236). – Chris Wesseling Apr 24 '14 at 09:07

2 Answers2

12

In each of the 10,000 or so iterations of the timeit code, the class is recreated from scratch. Creating classes is probably not a well-optimized operation in PyPy; even worse, doing so will probably discard all of the optimizations that the JIT learned about the previous incarnation of the class. PyPy tends to be slow until the JIT has warmed up, so doing things that require it to warm up repeatedly will kill your performance.

The solution here is, of course, to simply move the class definition outside of the code being benchmarked.

kenm
  • 23,127
  • 2
  • 43
  • 62
  • This is interesting, and a little disappointing. Class creation is a lightweight operation under CPython, and there are programming styles which definitely depend on that. – Marcin Apr 15 '14 at 00:06
  • @Marcin I believe the PyPy team considers code snippets being slower than CPython to be bugs, so feel free to file one. Especially if you know of some use of rapid creation of short-lived classes in real code. – kenm Apr 15 '14 at 00:20
  • 3
    It's worth noting that even under CPython, creating a new class isn't that cheap. Classes occupy several hundred bytes, way more than a typical object, and are in (and have) several caches, so allocation and deallocation is much slower than your average object. – Alex Gaynor Apr 15 '14 at 17:13
  • 1
    And without getting all Haskelly monadic, you often [don't need them](http://www.youtube.com/watch?v=o9pEzgHorH0). – Chris Wesseling Apr 24 '14 at 09:13
8

To directly answer the question in the title: __slots__ is pointless for (but doesn't hurt) performance in PyPy.

Armin Rigo
  • 12,048
  • 37
  • 48
  • Thank you, Armin. Could you please add or refer to any information on how does PyPy store instance variables? – alecxe Apr 15 '14 at 13:27
  • 4
    @alecxe http://morepypy.blogspot.ca/2010/11/efficiently-implementing-python-objects.html outlines the approach. – Alex Gaynor Apr 15 '14 at 17:13