28

I am using a simple model with an attribute that stores all the data for that object in a JSONField. Think of it as way to transfer NoSQL data to my PostgreSQL database. Kinda like this:

from django.contrib.postgres.fields import JSONField   

class Document(models.Model):
    content = JSONField()

Each Document object has (more or less) the same keys in its content field, so I am querying and ordering those documents using those keys. For the querying and ordering, I am using Django's annotate() function. I recently came across this:

https://docs.djangoproject.com/en/2.1/ref/contrib/postgres/indexes/

I also know that PostgreSQL using JSONB, which is apparently indexible. So my question is this: Can I index my content field somehow to make my read operations faster for complex queries? And if so, then how do I do it? The documentation page I linked has no examples.

darkhorse
  • 8,192
  • 21
  • 72
  • 148
  • 1
    I think there are similar questions related to this - https://stackoverflow.com/a/49358119/4116955 and https://stackoverflow.com/questions/17807030/how-to-create-index-on-json-field-in-postgres-9-3 Please migrate your json data to a postgresql table using its JSONB type and then try indexing the whole column or a specific field in the json payload. – Mahesh H Viraktamath Feb 22 '19 at 10:20
  • 1
    Thats interesting. However, I was looking for a way to do this from within Django, as that would make my codebase much more easier to manage. – darkhorse Feb 22 '19 at 10:26
  • 9
    Something like this should work - `class Doc(models.Model): data = JSONField() class Meta: indexes = [ GinIndex( fields=['data'], name='data_gin', ), ]` Sorry for the bad formatting. – Mahesh H Viraktamath Feb 22 '19 at 10:38
  • 1
    Wouldn't this index the entire column? Is that a good idea considering there might be all sorts of data types within that JSONField? – darkhorse Feb 22 '19 at 10:57
  • 1
    I think this index will work when you query for rows in a table where the key in a JSONB field maps to a particular value – Mahesh H Viraktamath Feb 22 '19 at 11:03
  • 1
    And also for the best performance, you have to rely on postgresql indices, because it is more a powerful way than index definitions in your python code. For DB performance, you must tune the DB itself rather than your code. – Mahesh H Viraktamath Feb 22 '19 at 11:05

3 Answers3

20

For those that want to index a particular key, create a raw sql migration:

  1. Run ./manage.py makemigrations --empty yourApp where yourApp is the app of the model you want to change indexes for.

  2. Edit the migration i.e.

operations = [
    migrations.RunSQL("CREATE INDEX idx_name ON your_table((json_field->'json_key'));")
]

Where idx_name is the name of the index, your_table is your table, json_field is your JSONField, and json_key in this case is the key you want to index.

Notice the -> operator in the index creation code line, and not ->> as mentioned here.

That should do it, but to verify all went well run the following sql:

SELECT
    indexname,
    indexdef
FROM
    pg_indexes
WHERE
    tablename = '<your-table>';

and see if your index is there.

jfs
  • 53
  • 1
  • 6
radtek
  • 34,210
  • 11
  • 144
  • 111
  • 2
    I think native support was added in 3.2 as part of this PR: https://github.com/django/django/pull/11929 But I'm not clear on how to use. All the Django tickets requesting indexes on JSONB fields were closed based on that PR. – Tim Tisdall Oct 28 '21 at 12:38
12

Django 3.2 introduced native support for these indexes, as Tim Tisdall commented in another answer. If you are adding these to an existing model, be sure to makemigrations and migrate.


To index a particular key, you combine an F expression with a JSONField path lookup on the model's Meta indexes option:

from django.contrib.postgres.fields import JSONField   

class Document(models.Model):
    content = JSONField()
    class Meta:
        indexes = [
            models.Index(models.F("content__example_field_name"), name="content__example_field_name_idx"),
        ]

This will create a B-Tree index.


If you want to use filters, like has_key on a top-level field, (e.g. Document.objects.filter(content__has_key="example_field_name")), you may want a GinIndex as Mahesh H Viraktamath suggested in another comment.

from django.contrib.postgres.fields import JSONField   
from django.contrib.postgres.indexes import GinIndex

class Document(models.Model):
    content = JSONField()
    
    class Meta:
        indexes = [
            GinIndex("content", name="content_idx"),
        ]

Finally, for completeness, you can select your opclass by wrapping with the OpClass expression.

from django.contrib.postgres.fields import JSONField   
from django.contrib.postgres.indexes import GinIndex, OpClass

class Document(models.Model):
    content = JSONField()
    
    class Meta:
        indexes = [
            GinIndex(
                OpClass(models.F("content__example_field_name"), name="jsonb_path_ops"),
                name="content__example_field_name_idx",
            ),
        ]
kcontr
  • 343
  • 2
  • 12
3

There is a bit more universal and Django native way. You can use following custom Migration Operation:

class CreateJsonbObjectKeyIndex(Operation):

    reversible = True

    def __init__(self, model_name, field, key, index_type='btree', concurrently=False, name=None):
        self.model_name = model_name
        self.field = field
        self.key = key
        self.index_type = index_type
        self.concurrently = concurrently
        self.name = name

    def state_forwards(self, app_label, state):
        pass

    def get_names(self, app_label, schema_editor, from_state, to_state):
        table_name = from_state.apps.get_model(app_label, self.model_name)._meta.db_table
        index_name = schema_editor.quote_name(
            self.name or schema_editor._create_index_name(table_name, [f'{self.field}__{self.key}'])
        )
        return table_name, index_name

    def database_forwards(self, app_label, schema_editor, from_state, to_state):
        table_name, index_name = self.get_names(app_label, schema_editor, from_state, to_state)
        schema_editor.execute(f"""
            CREATE INDEX {'CONCURRENTLY' if self.concurrently else ''} {index_name} 
            ON {table_name}
            USING {self.index_type}
            (({self.field}->'{self.key}'));
        """)

    def database_backwards(self, app_label, schema_editor, from_state, to_state):
        _, index_name = self.get_names(app_label, schema_editor, from_state, to_state)
        schema_editor.execute(f"DROP INDEX {index_name};")

    def describe(self):
        return f'Creates index for JSONB object field {self.field}->{self.key} of {self.model_name} model'

    @property
    def migration_name_fragment(self):
        return f'create_index_{self.model_name}_{self.field}_{self.key}'

Usage example:

from django.db import migrations

from util.migration import CreateJsonbObjectKeyIndex


class Migration(migrations.Migration):
    atomic = False  # Required if concurrently=True for 0 downtime background index creation

    dependencies = [
        ('app_label', '00XX_prev_migration'),
    ]

    operations = [
        migrations.SeparateDatabaseAndState(
            database_operations=[
                # Operation to run custom SQL command. Check the output of `sqlmigrate` to see the auto-generated SQL
                CreateJsonbObjectKeyIndex(
                    model_name='User', field='meta', key='adid', index_type='HASH',
                    concurrently=True,
                )
            ],
        )
    ]

Tested with Django-2.2 and and AWS Postgres RDS, but should be compatible with other Django

dtatarkin
  • 1,201
  • 8
  • 6