0

I have a very large (and growing) table of URLs, and I want to query the table to check if an item exists and return that item so I can edit it, or choose to add a new item. The code below works but runs very very slowly and, given the volume of queries I need to perform (several thousand per hour) is creating some issues. I haven't been able to find a better solution than below. I have a good sense of what is happening - it is loading the entire table every time, but there must be a faster way here.

Session = sessionmaker(bind=engine)
formatted_url = "%{}%".format(url)
matching_url = None
with Session.begin() as session:
    matching_url = session.query(Link.id).filter(Link.URL.like(formatted_url)).yield_per(200).first()

This works great if the URL exists and is recent, but especially if the URL isn't in the database at all, the process takes as long as one minute.

1 Answers1

2

You are doing a select from table where Linkid like %formatted_url% limit 1; This needs a full table scan in the database. If you are lucky, the row is still in memory or cache. If not, or if it does not exist the database will need that full table scan.

If you are using postgres on cloud SQL, this question will help you to remediate the problem PostgreSQL: Full Text Search - How to search partial words?

Iñigo González
  • 3,735
  • 1
  • 11
  • 27