3
  • When I query the table using trigram_similar or contains the index is used, as expected.

  • When I query the same table using icontains, Django performs a sequential scan using UPPER.

The difference is 3ms vs 240ms.

  • Should I create a lowercase index and match with contains? (How could this be done?)

  • Should I create a field where all the contents will be lower cased and index that field?

  • Something else?

The model:

class Name(models.Model):
    name_en = models.CharField(max_length=127)
    ...

    class Meta:
        indexes = [
            GinIndex(
                name="name_en_gin_trigram",
                fields=["name_en"],
                opclasses=["gin_trgm_ops"],
            )
        ]
  • The query that uses the index:

    >>> Name.objects.filter(
        Q(name_en__contains='eeth')
        | Q(name_en__trigram_similar='eeth')
    )
    
    SELECT *
    FROM "shop_name"
    WHERE ("shop_name"."name_en"::text LIKE '%eeth%' OR "shop_name"."name_en" % 'eeth')
    LIMIT 21;
    

    The resulting query plan:

    Limit  (cost=64.06..90.08 rows=7 width=121) (actual time=0.447..2.456 rows=14 loops=1)
       ->  Bitmap Heap Scan on shop_name  (cost=64.06..90.08 rows=7 width=121) (actual time=0.443..2.411 rows=14 loops=1)
             Recheck Cond: (((name_en)::text ~~ '%eeth%'::text) OR ((name_en)::text % 'eeth'::text))
             Rows Removed by Index Recheck: 236
             Heap Blocks: exact=206
             ->  BitmapOr  (cost=64.06..64.06 rows=7 width=0) (actual time=0.371..0.378 rows=0 loops=1)
                 ->  Bitmap Index Scan on name_en_gin_trigram  (cost=0.00..20.03 rows=4 width=0) (actual time=0.048..0.049 rows=15 loops=1)
                       Index Cond: ((name_en)::text ~~ '%eeth%'::text)
                 ->  Bitmap Index Scan on name_en_gin_trigram  (cost=0.00..44.03 rows=4 width=0) (actual time=0.318..0.320 rows=250 loops=1)
                       Index Cond: ((name_en)::text % 'eeth'::text)
    Planning Time: 0.793 ms
    Execution Time: 2.531 ms
    (12 rows)
    
  • If I use icontains the index is not used:

    >>> Name.objects.filter(
      Q(name_en__icontains='eeth')
      | Q(name_en__trigram_similar='eeth')
    )
    
    SELECT * 
    FROM "shop_name"
    WHERE (UPPER("shop_name"."name_en"::text) LIKE UPPER('%eeth%') OR "shop_name"."name_en" % 'eeth')
    LIMIT 21;
    

    The resulting query plan:

    Limit  (cost=0.00..95.61 rows=21 width=121) (actual time=10.513..244.244 rows=14 loops=1)
       ->  Seq Scan on shop_name  (cost=0.00..1356.79 rows=298 width=121) (actual time=10.509..244.195 rows=14 loops=1)
             Filter: ((upper((name_en)::text) ~~ '%EETH%'::text) OR ((name_en)::text % 'eeth'::text))
             Rows Removed by Filter: 36774
    Planning Time: 0.740 ms
    Execution Time: 244.299 ms
    (6 rows)
    
raratiru
  • 8,748
  • 4
  • 73
  • 113

2 Answers2

3

Update: This approach will not work as expected. However, the mechanics can be used to address the accepted approach.


Should I create a lowercase index and match with contains? (How could this be done?)

From Django-3.2:

Positional argument *expressions allows creating functional indexes on expressions and database functions.

For example:

Index(Lower('title').desc(), 'pub_date', name='lower_title_date_idx')

creates an index on the lowercased value of the title field in descending order and the pub_date field in the default ascending order.

It sounds like some quality Django music!

The code used to accomplish the above, is the following:

migrations/0001_initial.py:

'''
A fake migration used to install the necessary extensions.
It should be followed by 
./manage.py makemigrations && ./manage.py migrate
'''

from django.contrib.postgres.operations import (
    BtreeGinExtension, 
    TrigramExtension,
)
from django.db import migrations


class Migration(migrations.Migration):

    dependencies = []

    operations = [
        BtreeGinExtension(),
        TrigramExtension(),
    ]

models.py:

from django.contrib.postgres.indexes import GinIndex, OpClass
from django.db import models
from django.db.models.functions import Lower


class Name(models.Model):
    name_en = models.CharField(max_length=127)
    ...

    class Meta:
        indexes = [
            GinIndex(
                OpClass(Lower("name_en"), name="gin_trgm_ops"),
                name="name_en_gin_trigram_lowercase",
            ),
        ]

OpClass is used to avoid the error:

ValueError: Index.opclasses cannot be used with expressions. Use django.contrib.postgres.indexes.OpClass() instead.

Zoe
  • 27,060
  • 21
  • 118
  • 148
raratiru
  • 8,748
  • 4
  • 73
  • 113
3

Django runs icontains with UPPER(), and we can address this by making the index also UPPER():

CREATE INDEX upper_col_name_gin_idx ON table_name USING GIN (UPPER(col_name) gin_trgm_ops)

Django will then run WHERE UPPER("table_name"."col_name"::text) LIKE UPPER('%term%'), using this index.

  • 1
    Thank you for the answer. I assume that using [Index.expressions](https://docs.djangoproject.com/en/dev/ref/models/indexes/#django.db.models.Index.expressions) from Django-3.2, this could also be done in the `Meta` class of `models.Model`. – raratiru Aug 25 '21 at 15:36