How to improve count of text query for Django with Postgres

Question

I'm having a problem when counting query results for a full-text search. I have a model where one of the fields is:

search_vector = SearchVectorField(null=True)

with the index:

class Meta:
    indexes = (GinIndex(fields=["search_vector"]),)

Then in the view I have the following query:

query = SearchQuery(termo,config='portuguese')
search_rank = SearchRank(F('search_vector'), query)
entries = Article.objects.annotate(rank=search_rank).filter(search_vector=query).order_by('-rank')

And now if I apply the count method to entries it takes a long time. About 2 seconds, when I have a small database with about 200k rows.

lys · Answer 1 · 2021-12-25T22:41:48.563

If you drop the SearchRank and just filter using the query it will use the GIN index and perform much, much faster:

query = SearchQuery(termo,config='portuguese')
entries = Article.objects.filter(search_vector=query)

You can add .explain() to end to take a look at the query and see if the index is used:

print(entries.explain(analyze=True))

You should see the query using Bitmap Heap Scan and the Execution Time should be much faster.

Bitmap Heap Scan on your_table
...
Planning Time: 0.176 ms Execution Time: 0.453 ms

When you annotate like you are above, you are annotating every Article object - so postgres decides to perform a Seq Scan (or Parallel Seq Scan) which it decides is more efficient. More info here

Try adding .explain(verbose=True) or .explain(analyze=True) to your initial SearchRank method to compare.

query = SearchQuery(termo,config='portuguese')
search_rank = SearchRank(F('search_vector'), query)
entries = Article.objects.annotate(rank=search_rank).filter(search_vector=query).order_by('-rank')

print(entries.explain(analyze=True))

I'm facing this issue myself, with a table with 990k entries that takes ~10 seconds. If you can filter the query before the annotation using any other fields - it will push the query planner back toward using the Index.

If the SELECT returns more than approximately 5-10% of all rows in the table, a sequential scan is much faster than an index scan.

From this answer

How to improve count of text query for Django with Postgres

1 Answers1

Linked