I'm using ZODB and ZEO, and storing a Btree() in it with ~25 million objects whose keys are text strings of varying lengths. In order to iterate over the objects in a "safe and predictable way", I follow the advice of the BTrees documentation and first make copy of the list of keys, like so (where db
is the BTree object):
for key in list(db.keys()):
...do stuff...
However, the creation of that list of keys takes a relatively long time, even on a recent multicore server-grade system with 48 GB of RAM (running CentOS 7, bare metal, not in a VM). If I separately time how long it takes simply to do list(db.keys())
, it takes 9-10 minutes. In terms of size, sys.getsizeof(list(db.keys()))
reports 220 MB, which is consistent with the expected size of a list of 25 million strings whose lengths vary from 2-70 characters (approximately).
Is there a faster way to do the step of copying the keys to a list, or alternatively, is there a better approach to iterating over the BTree elements in a way that is safe in case other processes are adding objects to the BTree?
I have tried to research faster copy approaches in Python, but what I have found has been focused on copying lists or dictionaries (e.g., past SO questions here and here), not on "listifying" the keys from a BTree.