I'm trying to bulk update lots of records (3.2 million).
This is my model:
class MyModel(models.Model):
stats = JSONField()
Where stats is typically saved as:
{
"my_old_key": [{"hey": "Hello World"}]
}
I want to update it so that only the key changes it's value and final stats look like this (though there are records with "my_new_key"
already so preferably skip those somehow):
{
"my_new_key": [{"hey": "Hello World"}]
}
I can do it in Python but it works really slowly.. I've tried on ~10k records with batching approach from this link so that the queryset is not fetched entirely into the memory. Final solution looked like this:
def update_queryset(queryset) -> None:
for stat in queryset:
dictionary = stat.stats
if "my_old_key" in dictionary:
dictionary["my_new_key"] = dictionary.pop("my_old_key")
stat.save()
It does work but unfortunately it works too slowly :(.
I thought about not fetching it to python at all and working purely on database somewhat like this answer suggested but didn't manage to make it work.
Any suggestions on how I can speed it up / write RawSQL / make the jsonb_set
approach work?
EDIT: I've filtered queryset before hand so as to skip additional if check but to no avail. Still works slowly..
queryset = MyModel.objects.filter(stats__has_key="my_old_key")
...
def update_queryset(queryset) -> None:
for stat in queryset:
dictionary = stat.stats
dictionary["my_new_key"] = dictionary.pop("my_old_key")
stat.save()