I'm trying to find a memory efficient way to do a paged query to test for an empty collection, but can't seem to figure out how to go about it efficiently on a large database. The table layout uses an Association Object with bi-directional backrefs. It is very similar to the documentation.
class Association(Base):
__tablename__ = 'Association'
assoc_id = Column(Integer, primary_key=True, nullable=False, unique=True)
member_id = Column(Integer, ForeignKey('Member.id'))
chunk_id = Column(Integer, ForeignKey('Chunk.id'))
extra = Column(Text)
chunk = relationship("Chunk", backref=backref("assoc", lazy="dynamic"))
class Member(Base):
__tablename__ = 'Member'
id = Column(Integer, primary_key=True, nullable=False, unique=True)
assocs = relationship("Association", backref="member", cascade="all, delete", lazy="dynamic")
class Chunk(Base):
__tablename__ = 'Chunk'
id = Column(Integer, primary_key=True, nullable=False, unique=True)
name = Column(Text, unique=True)
If the member is deleted, it will cascade and delete the member's associations. However, the chunk objects will be orphaned in the database. To delete the orphaned chunks, I can test for an empty collection using a query like this:
session.query(Chunk).filter(~Chunk.assoc.any())
and then delete the chunks with:
query.delete(synchronize_session=False)
However, if the association and chunk tables are large it seems the query or subquery loads up everything and the memory skyrockets.
I've seen the concept of using a paged query to limit the memory usage of standard queries here:
def page_query(q, count=1000):
offset = 0
while True:
r = False
for elem in q.limit(count).offset(offset):
r = True
yield elem
offset += count
if not r:
break
for chunk in page_query(Session.query(Chunk)):
print chunk.name
However this doesn't appear to work with the empty collection query as the memory usage is still high. Is there a way to do a paged query for an empty collection like this?