I have a many-to-many relationship between the Image and Tag tables in my project:
tags2images = db.Table("tags2images",
db.Column("tag_id", db.Integer, db.ForeignKey("tags.id", ondelete="CASCADE", onupdate="CASCADE"), primary_key=True),
db.Column("image_id", db.Integer, db.ForeignKey("images.id", ondelete="CASCADE", onupdate="CASCADE"), primary_key=True)
)
class Image(db.Model):
__tablename__ = "images"
id = db.Column(db.Integer, primary_key=True, autoincrement=False)
title = db.Column(db.String(1000), nullable=True)
tags = db.relationship("Tag", secondary=tags2images, back_populates="images", passive_deletes=True)
class Tag(db.Model):
__tablename__ = "tags"
id = db.Column(db.Integer, primary_key=True, autoincrement=True)
name = db.Column(db.String(250), nullable=False, unique=True)
images = db.relationship(
"Image",
secondary=tags2images,
back_populates="tags",
passive_deletes=True
)
and I'd like to grab a list of tags, ordered by how many times they're used in images. My images and tags tables contain ~200.000 and ~1.000.000 rows respectively, so there's a decent amount of data.
After a bit of messing around, I arrived at this monstrosity:
db.session.query(Tag, func.count(tags_assoc.c.tag_id).label("total"))\
.join(tags_assoc)\
.group_by(Tag)\
.order_by(text("total DESC"))\
.limit(20).all()
and while it does return a list of (Tag, count)
tuples the way I want it to, it takes several seconds, which is not optimal.
I found this very helpful post (Counting relationships in SQLAlchemy) that helped me simplify the above to just
db.session.query(Tag.name, func.count(Tag.id))\
.join(Tag.works)\
.group_by(Tag.id)\
.limit(20).all()
and while this is wicked fast compared to my first attempt, the output obviously isn't sorted anymore. How can I get SQLAlchemy to produce the desired result while keeping the query fast?