75

I have to insert 8000+ records into a SQLite database using Django's ORM. This operation needs to be run as a cronjob about once per minute.
At the moment I'm using a for loop to iterate through all the items and then insert them one by one.
Example:

for item in items:
    entry = Entry(a1=item.a1, a2=item.a2)
    entry.save()

What is an efficient way of doing this?

Edit: A little comparison between the two insertion methods.

Without commit_manually decorator (11245 records):

nox@noxdevel marinetraffic]$ time python manage.py insrec             

real    1m50.288s
user    0m6.710s
sys     0m23.445s

Using commit_manually decorator (11245 records):

[nox@noxdevel marinetraffic]$ time python manage.py insrec                

real    0m18.464s
user    0m5.433s
sys     0m10.163s

Note: The test script also does some other operations besides inserting into the database (downloads a ZIP file, extracts an XML file from the ZIP archive, parses the XML file) so the time needed for execution does not necessarily represent the time needed to insert the records.

Cœur
  • 37,241
  • 25
  • 195
  • 267

9 Answers9

131

You want to check out django.db.transaction.commit_manually.

http://docs.djangoproject.com/en/dev/topics/db/transactions/#django-db-transaction-commit-manually

So it would be something like:

from django.db import transaction

@transaction.commit_manually
def viewfunc(request):
    ...
    for item in items:
        entry = Entry(a1=item.a1, a2=item.a2)
        entry.save()
    transaction.commit()

Which will only commit once, instead at each save().

In django 1.3 context managers were introduced. So now you can use transaction.commit_on_success() in a similar way:

from django.db import transaction

def viewfunc(request):
    ...
    with transaction.commit_on_success():
        for item in items:
            entry = Entry(a1=item.a1, a2=item.a2)
            entry.save()

In django 1.4, bulk_create was added, allowing you to create lists of your model objects and then commit them all at once.

NOTE the save method will not be called when using bulk create.

>>> Entry.objects.bulk_create([
...     Entry(headline="Django 1.0 Released"),
...     Entry(headline="Django 1.1 Announced"),
...     Entry(headline="Breaking: Django is awesome")
... ])

In django 1.6, transaction.atomic was introduced, intended to replace now legacy functions commit_on_success and commit_manually.

from the django documentation on atomic:

atomic is usable both as a decorator:

from django.db import transaction

@transaction.atomic
def viewfunc(request):
    # This code executes inside a transaction.
    do_stuff()

and as a context manager:

from django.db import transaction

def viewfunc(request):
    # This code executes in autocommit mode (Django's default).
    do_stuff()

    with transaction.atomic():
        # This code executes inside a transaction.
        do_more_stuff()
monkut
  • 42,176
  • 24
  • 124
  • 155
  • 8
    This will instantiate them all as models, and run thousands of individual inserts. I've always had to drop to SQL and do manual batch inserts for this type of volume; Django isn't built for it. But yes, you definitely want a single transaction if you're doing it this way. – Glenn Maynard Jul 17 '09 at 00:26
  • I don't know the Django ORM that well, but doesn't the ORM just generate the SQL for you? And in a simple model with no foreign keys, doesn't a single instance translate to a single insert statement? – monkut Jul 17 '09 at 02:28
  • Hi could you please elaborate the same in terms of .net? It would be a great help , as i am facing the same situation –  Jul 29 '10 at 09:51
  • 2
    I don't have .net experience, but speaking from a general Database perspective, turn off AUTOCOMMIT and encapsulating INSERT statements between BEGIN/END TRANSACTION statements will be faster than using AUTOCOMMIT and running INSERTS alone. Note, these commands and how they are used can change based on the DB your using. If you want a .net or .net framework specific answer go ahead and start a new question. – monkut Jul 30 '10 at 01:56
  • 3
    Now that Django 1.4 is out, using https://docs.djangoproject.com/en/dev/ref/models/querysets/#django.db.models.query.QuerySet.bulk_create makes a lot more sense. The other fast alternative is to manually create a batch SQL insert. The tip here (committing in one transaction) will not be nearly as fast as sending in one insert. – Ben Regenspan Mar 29 '12 at 20:15
  • 2
    As of 1.9 bulk_create is working great. Note that you'll need to break up creation into batches with no more than 999 total added properties for SQLite. – Ezekiel Kruglick Jan 21 '16 at 03:19
  • transaction.commit_manually was removed in 1.8 https://docs.djangoproject.com/en/dev/internals/deprecation/ – Marc Laugharn Dec 10 '18 at 19:01
  • 1
    Worth noting that `transaction.atomic` will not make the code run any faster. Otherwise, excellent summary, thanks. – Owen Nov 23 '19 at 22:19
  • @MarcLaugharn Well then wtf! – AlxVallejo Jul 15 '20 at 15:33
  • wow, this answer is 11 years old... Maybe it's about time to remove the 1.X references... – monkut Jul 15 '20 at 23:43
11

Bulk creation is available in Django 1.4:

https://django.readthedocs.io/en/1.4/ref/models/querysets.html#bulk-create

sudip
  • 166
  • 11
A-IV
  • 2,555
  • 2
  • 21
  • 17
3

Have a look at this. It's meant for use out-of-the-box with MySQL only, but there are pointers on what to do for other databases.

Vinay Sajip
  • 95,872
  • 14
  • 179
  • 191
3

You might be better off bulk-loading the items - prepare a file and use a bulk load tool. This will be vastly more efficient than 8000 individual inserts.

ConcernedOfTunbridgeWells
  • 64,444
  • 15
  • 143
  • 197
3

To answer the question particularly with regard to SQLite, as asked, while I have just now confirmed that bulk_create does provide a tremendous speedup there is a limitation with SQLite: "The default is to create all objects in one batch, except for SQLite where the default is such that at maximum 999 variables per query is used."

The quoted stuff is from the docs--- A-IV provided a link.

What I have to add is that this djangosnippets entry by alpar also seems to be working for me. It's a little wrapper that breaks the big batch that you want to process into smaller batches, managing the 999 variables limit.

  • The Django snippet should be unnecessary with Django ≥1.5, right? Since Django 1.5, there is a `batch_size` parameter that you can use: "The batch_size parameter controls how many objects are created in single query. The default is to create all objects in one batch, except for SQLite where the default is such that at maximum 999 variables per query is used. The batch_size parameter was added in version 1.5." – mic Sep 11 '20 at 04:29
2

You should check out DSE. I wrote DSE to solve these kinds of problems ( massive insert or updates ). Using the django orm is a dead-end, you got to do it in plain SQL and DSE takes care of much of that for you.

Thomas

Weholt
  • 1,889
  • 5
  • 22
  • 35
  • 1
    Another thing; If you decide to use plain SQL and if the SQL you`re inserting has the same fields each time, try using cursor.executemany(SQL, [list of entries to insert]). Much faster than running an insert per entry. – Weholt Feb 05 '11 at 01:26
-1
def order(request):    
    if request.method=="GET":
        cust_name = request.GET.get('cust_name', '')
        cust_cont = request.GET.get('cust_cont', '')
        pincode = request.GET.get('pincode', '')
        city_name = request.GET.get('city_name', '')
        state = request.GET.get('state', '')
        contry = request.GET.get('contry', '')
        gender = request.GET.get('gender', '')
        paid_amt = request.GET.get('paid_amt', '')
        due_amt = request.GET.get('due_amt', '')
        order_date = request.GET.get('order_date', '')
        print(order_date)
        prod_name = request.GET.getlist('prod_name[]', '')
        prod_qty = request.GET.getlist('prod_qty[]', '')
        prod_price = request.GET.getlist('prod_price[]', '')
        print(prod_name)
        print(prod_qty)
        print(prod_price)
        # insert customer information into customer table
        try:
            # Insert Data into customer table
            cust_tab = Customer(customer_name=cust_name, customer_contact=cust_cont, gender=gender, city_name=city_name, pincode=pincode, state_name=state, contry_name=contry)
            cust_tab.save()
            # Retrive Id from customer table
            custo_id = Customer.objects.values_list('customer_id').last()   #It is return
Tuple as result from Queryset
            custo_id = int(custo_id[0]) #It is convert the Tuple in INT
            # Insert Data into Order table
            order_tab = Orders(order_date=order_date, paid_amt=paid_amt, due_amt=due_amt, customer_id=custo_id)
            order_tab.save()
            # Insert Data into Products table
            # insert multiple data at a one time from djanog using while loop
            i=0
            while(i<len(prod_name)):
                p_n = prod_name[i]
                p_q = prod_qty[i]
                p_p = prod_price[i]
                # this is checking the variable, if variable is null so fill the varable value in database
                if p_n != "" and p_q != "" and p_p != "":
                    prod_tab = Products(product_name=p_n, product_qty=p_q, product_price=p_p, customer_id=custo_id)
                    prod_tab.save()
                i=i+1
scotty3785
  • 6,763
  • 1
  • 25
  • 35
-2

I recommend using plain SQL (not ORM) you can insert multiple rows with a single insert:

insert into A select from B;

The select from B portion of your sql could be as complicated as you want it to get as long as the results match the columns in table A and there are no constraint conflicts.

Sergey Golovchenko
  • 18,203
  • 15
  • 55
  • 72
-2
def order(request):    
    if request.method=="GET":
        # get the value from html page
        cust_name = request.GET.get('cust_name', '')
        cust_cont = request.GET.get('cust_cont', '')
        pincode = request.GET.get('pincode', '')
        city_name = request.GET.get('city_name', '')
        state = request.GET.get('state', '')
        contry = request.GET.get('contry', '')
        gender = request.GET.get('gender', '')
        paid_amt = request.GET.get('paid_amt', '')
        due_amt = request.GET.get('due_amt', '')
        order_date = request.GET.get('order_date', '')
        prod_name = request.GET.getlist('prod_name[]', '')
        prod_qty = request.GET.getlist('prod_qty[]', '')
        prod_price = request.GET.getlist('prod_price[]', '')

        # insert customer information into customer table
        try:
            # Insert Data into customer table
            cust_tab = Customer(customer_name=cust_name, customer_contact=cust_cont, gender=gender, city_name=city_name, pincode=pincode, state_name=state, contry_name=contry)
            cust_tab.save()
            # Retrive Id from customer table
            custo_id = Customer.objects.values_list('customer_id').last()   #It is return Tuple as result from Queryset
            custo_id = int(custo_id[0]) #It is convert the Tuple in INT
            # Insert Data into Order table
            order_tab = Orders(order_date=order_date, paid_amt=paid_amt, due_amt=due_amt, customer_id=custo_id)
            order_tab.save()
            # Insert Data into Products table
            # insert multiple data at a one time from djanog using while loop
            i=0
            while(i<len(prod_name)):
                p_n = prod_name[i]
                p_q = prod_qty[i]
                p_p = prod_price[i]

                # this is checking the variable, if variable is null so fill the varable value in database
                if p_n != "" and p_q != "" and p_p != "":
                    prod_tab = Products(product_name=p_n, product_qty=p_q, product_price=p_p, customer_id=custo_id)
                    prod_tab.save()
                i=i+1

            return HttpResponse('Your Record Has been Saved')
        except Exception as e:
            return HttpResponse(e)     

    return render(request, 'invoice_system/order.html')
  • This code is running properly. I'm checked on my level. if any error occurring from your side so please check code again and understand what's meaning of this code and then try it again. and anybody knows better and easy way of inserting multiple data in a one time in the database, please share with us. Thank You – Mohit Mishra May 02 '19 at 12:39
  • 1
    Please include clarification, further explanation etc. directly into your answer instead of using [comments](https://stackoverflow.com/help/privileges/comment). Comments should be used for asking for more information or for suggesting improvements. – Joey May 02 '19 at 13:01