0

I'm trying to bulk update lots of records (3.2 million).

This is my model:

class MyModel(models.Model):
    stats = JSONField()

Where stats is typically saved as:

{ 
   "my_old_key": [{"hey": "Hello World"}]
}

I want to update it so that only the key changes it's value and final stats look like this (though there are records with "my_new_key" already so preferably skip those somehow):

{ 
   "my_new_key": [{"hey": "Hello World"}]
}

I can do it in Python but it works really slowly.. I've tried on ~10k records with batching approach from this link so that the queryset is not fetched entirely into the memory. Final solution looked like this:

def update_queryset(queryset) -> None:
    for stat in queryset:
        dictionary = stat.stats
        if "my_old_key" in dictionary:
            dictionary["my_new_key"] = dictionary.pop("my_old_key")
        stat.save()

It does work but unfortunately it works too slowly :(.

I thought about not fetching it to python at all and working purely on database somewhat like this answer suggested but didn't manage to make it work.

Any suggestions on how I can speed it up / write RawSQL / make the jsonb_set approach work?

EDIT: I've filtered queryset before hand so as to skip additional if check but to no avail. Still works slowly..

queryset = MyModel.objects.filter(stats__has_key="my_old_key")
...
def update_queryset(queryset) -> None:
    for stat in queryset:
        dictionary = stat.stats
        dictionary["my_new_key"] = dictionary.pop("my_old_key")
        stat.save() 
Revinder
  • 27
  • 1
  • 5
  • Did you try with `transaction.atomic()` ? – thepylot Sep 15 '21 at 08:10
  • 1
    Use https://docs.djangoproject.com/en/3.2/ref/models/querysets/#bulk-update – lucutzu33 Sep 15 '21 at 08:44
  • @codepylot I did just now, does not seem to make it any faster whether its used to wrap entire `update_queryset` method or `save()` call :(. – Revinder Sep 15 '21 at 08:55
  • @EneP I know about bulk_update but I am not quite sure how to use it with `JSONField` the way I described. Could you please show me how could that look like? – Revinder Sep 15 '21 at 08:56

1 Answers1

1
def update_queryset(queryset) -> None:
    for stat in queryset:
        stat.stats["my_new_key"] = stat.stats.pop("my_old_key")
    MyModel.objects.bulk_update(queryset, ['stats'])
lucutzu33
  • 3,470
  • 1
  • 10
  • 24
  • I'm not sure why but this solution does not update the models IF i am using batching [link](https://djangosnippets.org/snippets/1170/) or using `queryset.iterator()` which I think I should be doing with 3.2 million records? EDIT: It does seem work without batching. – Revinder Sep 15 '21 at 09:48