3

Is there an established module, or good practice, to work efficiently with large object pools in Python 3?

What I mean by "object pool" is some class capable of:

  • fetching new instances of specified type, while dynamically extending the memory allocation under the hood when necessary;
  • maintaining a consistent indexing for previously fetched objects.

Here is a basic example:

class Value:
    __slots__ = ('a','b')
    def __init__(self,a=None,b=None):
        self.a = a
        self.b = b

class BasicPool:
    def __init__(self):
        self.data = []
    def __getitem__(self,k):
        return self.data[k]
    def fetch(self):
        v = Value()
        self.data.append(v)
        return v

class BlockPool:
    def __init__(self,bsize=100):
        self.bsize = bsize
        self.next = bsize
        self.data = []
    def __getitem__(self,k):
        b,k = divmod(k,self.bsize)
        return self.data[b][k]
    def fetch(self):
        self.next += 1
        if self.next >= self.bsize:
            self.data.append([ Value() for _ in range(self.bsize) ])
            self.next = 0
        return self.data[-1][self.next]

The BasicPool doesn't do anything smart: whenever a new instance is requested, it is instanciated and appended to an underlying list. On the other hand, the BlockPool grows a list of pre-allocated blocks of instances. Surprisingly though, it seems that preallocation is not beneficial in practice:

from timeit import default_timer as timer

def benchmark(P):
    N = int(1e6)
    start = timer()
    for _ in range(N): P.fetch()
    print( timer() - start )

print( 'Basic pool:' )
for _ in range(5): benchmark(BasicPool())

#     Basic pool:
#     1.2352294209995307
#     0.5003506309985823
#     0.48115064000012353
#     0.48508202800076106
#     1.1760561199989752

print( 'Block pool:' )
for _ in range(5): benchmark(BlockPool())

#     Block pool:
#     0.7272855400005938
#     1.4875716509995982
#     0.726611527003115
#     0.7369502859983186
#     1.4867010340021807

As you can see, the BasicPool is always faster than the BlockPool (I also don't know the cause of these large variations). Pools of objects must be a fairly common need in Python; is the best approach really to use the builtin list.append? Are there smarter containers that can be used to further improve runtime performance, or is this dominated by the instanciation time anyway?

Jonathan H
  • 7,591
  • 5
  • 47
  • 80
  • 1
    In `BlockPool.fetch()` you have `self.data.append( [Value()]*self.bsize )`. This is creating only one instance of `Value`. Change it to `[ Value() for _ in range(bsize) ]` – Andrej Kesely Jul 26 '19 at 13:12
  • 1
    @AndrejKesely Thanks a lot for spotting this; this actually makes `BlockPool` less efficient.. I will edit my post. – Jonathan H Jul 26 '19 at 13:20
  • `list` itself is a dynamic array which is almost the same as you implement. And it is implemented with C code which makes it much faster. Well, I think it completely fits your requirements? – Sraw Jul 26 '19 at 13:31
  • @Sraw It would appear so; I am just wondering if there are even more efficient ways or not :) – Jonathan H Jul 26 '19 at 13:37
  • You're still adding one at a time, although it is in a block sized loop. In a language that supports direct memory management, growing a pool is typically by a multiplier -- I'm not sure if you gain from a fixed grow size. – Kenny Ostrom Jul 26 '19 at 13:37
  • 1
    See the linked answer in the comment under the answer here. It grows it by adding instances of None. That should help. – Kenny Ostrom Jul 26 '19 at 14:21

1 Answers1

2

The whole point of the geometric growth of the array underlying a list is to reduce the reallocation overhead to a constant factor. That constant can easily be smaller than that for manually making blocks (principally because of the slow, interpreted manipulation of self.next and self.data in the latter). (Asymptotically, the cost of BlockPool.fetch is still the append, of course.) Moreover, your benchmark doesn’t include the additional cost of destroying the blocks, nor that of the two-step indexing on read.

So list is surely as good as it gets (without writing your own C code). You can improve BasicPool a bit by inheriting from list rather than containing one, eliminating a dictionary lookup per fetch and the interpreted __getitem__ wrapper entirely.

Davis Herring
  • 36,443
  • 4
  • 48
  • 76
  • FYI growing pools inheriting from `list` can be implemented as in [this post](https://stackoverflow.com/a/4544699/472610). – Jonathan H Jul 26 '19 at 14:13