Is there an established module, or good practice, to work efficiently with large object pools in Python 3?
What I mean by "object pool" is some class capable of:
- fetching new instances of specified type, while dynamically extending the memory allocation under the hood when necessary;
- maintaining a consistent indexing for previously fetched objects.
Here is a basic example:
class Value:
__slots__ = ('a','b')
def __init__(self,a=None,b=None):
self.a = a
self.b = b
class BasicPool:
def __init__(self):
self.data = []
def __getitem__(self,k):
return self.data[k]
def fetch(self):
v = Value()
self.data.append(v)
return v
class BlockPool:
def __init__(self,bsize=100):
self.bsize = bsize
self.next = bsize
self.data = []
def __getitem__(self,k):
b,k = divmod(k,self.bsize)
return self.data[b][k]
def fetch(self):
self.next += 1
if self.next >= self.bsize:
self.data.append([ Value() for _ in range(self.bsize) ])
self.next = 0
return self.data[-1][self.next]
The BasicPool
doesn't do anything smart: whenever a new instance is requested, it is instanciated and appended to an underlying list
. On the other hand, the BlockPool
grows a list of pre-allocated blocks of instances. Surprisingly though, it seems that preallocation is not beneficial in practice:
from timeit import default_timer as timer
def benchmark(P):
N = int(1e6)
start = timer()
for _ in range(N): P.fetch()
print( timer() - start )
print( 'Basic pool:' )
for _ in range(5): benchmark(BasicPool())
# Basic pool:
# 1.2352294209995307
# 0.5003506309985823
# 0.48115064000012353
# 0.48508202800076106
# 1.1760561199989752
print( 'Block pool:' )
for _ in range(5): benchmark(BlockPool())
# Block pool:
# 0.7272855400005938
# 1.4875716509995982
# 0.726611527003115
# 0.7369502859983186
# 1.4867010340021807
As you can see, the BasicPool
is always faster than the BlockPool
(I also don't know the cause of these large variations). Pools of objects must be a fairly common need in Python; is the best approach really to use the builtin list.append
? Are there smarter containers that can be used to further improve runtime performance, or is this dominated by the instanciation time anyway?