55

I'm using Django with an sqlite backend, and write performance is a problem. I may graduate to a "proper" db at some stage, but for the moment I'm stuck with sqlite. I think that my write performance problems are probably related to the fact that I'm creating a large number of rows, and presumably each time I save() one it's locking, unlocking and syncing the DB on disk.

How can I aggregate a large number of save() calls into a single database operation?

kdt
  • 27,905
  • 33
  • 92
  • 139
  • 1
    http://stackoverflow.com/questions/1136106/efficent-way-to-insert-thousands-of-records-into-a-table-sqlite-python-django – Tomasz Wysocki Aug 03 '10 at 09:53
  • @Tomasz - thank you for the link, the @commit_manually decorator gives me the performance improvement I need. I have marked my question for closure as a dupe because I think the other one covers it quite well. – kdt Aug 03 '10 at 10:22
  • You can actually close your question rather than simply vote for closure, you know. – S.Lott Aug 03 '10 at 12:41
  • 1
    Using parameterized queries is also much faster. See http://stackoverflow.com/questions/904796/how-do-i-get-around-the-problem-in-sqlite-and-c/926251#926251 – TTT Aug 04 '10 at 03:57

4 Answers4

82

EDITED: commit_on_success is deprecated and was removed in Django 1.8. Use transaction.atomic instead. See Fraser Harris's answer.

Actually this is easier to do than you think. You can use transactions in Django. These batch database operations (specifically save, insert and delete) into one operation. I've found the easiest one to use is commit_on_success. Essentially you wrap your database save operations into a function and then use the commit_on_success decorator.

from django.db.transaction import commit_on_success

@commit_on_success
def lot_of_saves(queryset):
    for item in queryset:
        modify_item(item)
        item.save()

This will have a huge speed increase. You'll also get the benefit of having roll-backs if any of the items fail. If you have millions of save operations then you may have to commit them in blocks using the commit_manually and transaction.commit() but I've rarely needed that.

starball
  • 20,030
  • 7
  • 43
  • 238
JudoWill
  • 4,741
  • 2
  • 36
  • 48
  • This was almost magical. I wrapped a random function that creates a lot of records, and it instantly became several times faster in SQLite. – A B Jun 24 '11 at 20:02
  • 60
    For those of you running across this answer now, `commit_on_success` is now deprecated. Instead, replace that with `atomic`, which is the new version and will have the same magical effect! – Julia Ebert Jul 23 '14 at 17:47
  • 11
    `atomic` reference: https://docs.djangoproject.com/en/dev/topics/db/transactions/#django.db.transaction.atomic – Paolo Jun 15 '16 at 14:28
  • This definitely made by day, bulk insertion about 30x faster. Magical effect indeed!! – zeycus Mar 14 '17 at 16:21
80

New as of Django 1.6 is atomic, a simple API to control DB transactions. Copied verbatim from the docs:

atomic is usable both as a decorator:

from django.db import transaction

@transaction.atomic
def viewfunc(request):
    # This code executes inside a transaction.
    do_stuff()

and as a context manager:

from django.db import transaction

def viewfunc(request):
    # This code executes in autocommit mode (Django's default).
    do_stuff()

    with transaction.atomic():
        # This code executes inside a transaction.
        do_more_stuff()

Legacy django.db.transaction functions autocommit(), commit_on_success(), and commit_manually() have been deprecated and will be remove in Django 1.8.

Sdra
  • 2,297
  • 17
  • 30
Fraser Harris
  • 1,912
  • 14
  • 19
18

I think this is the method you are looking for: https://docs.djangoproject.com/en/dev/ref/models/querysets/#bulk-create

Code copied from the docs:

Entry.objects.bulk_create([
    Entry(headline='This is a test'),
    Entry(headline='This is only a test'),
])

Which in practice, would look like:

my_entries = list()
for i in range(100):
    my_entries.append(Entry(headline='Headline #'+str(i))

Entry.objects.bulk_create(my_entries)

According to the docs, this executes a single query, regardless of the size of the list (maximum 999 items on SQLite3), which can't be said for the atomic decorator.

There is an important distinction to make. It sounds like, from the OP's question, that he is attempted to bulk create rather than bulk save. The atomic decorator is the fastest solution for saving, but not for creating.

Chris Conlan
  • 2,774
  • 1
  • 19
  • 23
  • 1
    I am following a similar line of direction for one of my work. Is there a way around for models not having primary key? I am creating model objects using dictionary. Without pk I get exception "dict object has no attribute pk". – Adarsh Trivedi Aug 03 '20 at 19:51
0

"How can I aggregate a large number of save() calls into a single database operation?"

You don't need to. Django already manages a cache for you. You can't improve it's DB caching by trying to fuss around with saves.

"write performance problems are probably related to the fact that I'm creating a large number of rows"

Correct.

SQLite is pretty slow. That's the way it is. Queries are faster than most other DB's. Writes are pretty slow.

Consider more serious architecture change. Are you loading rows during a web transaction (i.e., bulk uploading files and loading the DB from those files)?

If you're doing bulk loading inside a web transaction, stop. You need to do something smarter. Use celery or use some other "batch" facility to do your loads in the background.

We try to limit ourself to file validation in a web transaction and do the loads when the user's not waiting for their page of HTML.

S.Lott
  • 384,516
  • 81
  • 508
  • 779
  • 5
    Although it's fair to point out that sqlite is indeed inherently slow on writes due to the way it uses disk, it doesn't have to be *this* slow: as suggested by a commentor, I started using the @commit_manually decorator and found a really substantial improvement. – kdt Aug 03 '10 at 10:24
  • 1
    SQLite3 is slow due its file-locking mechanism. Locking and unlocking a file for write access takes time. Therefore, putting all of your saves in a transaction with `transaction.atomic()` will solve your problems, because many `save()` calls take place on a single file-lock/file-unlock cycle. – Chris Conlan Oct 16 '18 at 12:46