0

This is my project setup, using standard Django startproject command with a single app:

Python 3.5.1
Django 1.9.7
PostgreSQL 9.5.3
Ubuntu 16.04

The app's models.py defines 2 models:

from django.db import models

class A(models.Model):
    n = models.PositiveIntegerField(primary_key=True)

class B(models.Model):
    a = models.ForeignKey(A, blank=True, null=True)
    m = models.CharField(max_length=20, db_index=True)

    class Meta:
        unique_together = ('a', 'm')

    def __str__(self):
        return '%s' % self.m

Here's my management command called execute.py to create instances of B:

from multiprocessing import Pool
from django import db
from django.core.management.base import BaseCommand
from .models import B

M = 'abcdef'

def create():
    obj, created = B.objects.get_or_create(m=M, defaults={'a': None})
    if created:
        print('obj=%s' % obj)

class Command(BaseCommand):
    def handle(self, *args, **kwargs):
        B.objects.filter(m=M).delete()
        db.connections.close_all()
        n = 4
        pool = Pool(processes=n)
        results = []
        for _ in range(n):
            result = pool.apply_async(create)
            results.append(result)
        pool.close()
        for result in results:
            result.get()
        pool.join()

Running python manage.py execute causes 4 objs to be created instead of just 1. This is not always the case though. Sometimes, there are only 3 or even just 1 object created:

obj=abcdef
obj=abcdef
obj=abcdef
obj=abcdef

Am I missing something to enforce the uniqueness here?

traceback
  • 11
  • 4
  • 1
    thanks for giving us the extra mental excercise of tracking As and Bs – e4c5 Jul 15 '16 at 12:41
  • Possible duplicate of [Django: how to do get\_or\_create() in a threadsafe way?](http://stackoverflow.com/questions/6586552/django-how-to-do-get-or-create-in-a-threadsafe-way) – solarissmoke Jul 15 '16 at 12:50

1 Answers1

0

This behavior is explained in the documentation for get_or_create():

This method is atomic assuming correct usage, correct database configuration, and correct behavior of the underlying database. However, if uniqueness is not enforced at the database level [...], this method is prone to a race-condition which can result in multiple rows with the same parameters being inserted simultaneously.

The m field of your B model has no unique=True constraint and this is causing the race-condition. Change the field definition to this:

m = models.CharField(max_length=20, db_index=True, unique=True)

and you should be fine.

Andrea Corbellini
  • 17,339
  • 3
  • 53
  • 69
  • Adding `unique=True` in the `m` field works. But what if there can be duplicate value for `m` and only `a` and `m` needs to be unique together as specified in the models.py? – traceback Jul 15 '16 at 13:05
  • @traceback: in that case you need to implement your own synchronization mechanism. You can't even use your own SQL statement: Postgresql offers `INSERT ... ON CONFLICT DO NOTHING`, but for that to work in your case, there needs to be a conflict (i.e., there must be a `UNIQUE` column) – Andrea Corbellini Jul 15 '16 at 13:24
  • This index was created in PostgreSQL `"app_b_a_id_0f0268ca_uniq" UNIQUE CONSTRAINT, btree (a_id, m)` after migrate. I am curious why `get_or_create` still attempts to create a new object given the `UNIQUE CONSTRAINT`. – traceback Jul 15 '16 at 13:31
  • That's a constraint on `B.a`, not on `B.m` – Andrea Corbellini Jul 15 '16 at 13:31
  • Okay, I think the `defaults` that uses NULL value is the cuplrit here: `obj, created = B.objects.get_or_create(m=M, defaults={'a': None})`. The issue does not occur when an instance of `A` is used in the `defaults`. – traceback Jul 15 '16 at 13:45
  • @traceback: if you want to use `unique_together`, then you must pass both `a` and `m` as keyword arguments to `get_or_create()`. Don't put `a` in `defaults` – Andrea Corbellini Jul 15 '16 at 13:48
  • @traceback Note that even with a unique constraint, you can have many `NULL` values without any conflicts. A unique constraint only enforces uniqueness of non-null values. In extent, you can have duplicate values for `m` with a `NULL` values for `a` if you have a unique constraint on `a ` and `m` together. – knbk Jul 15 '16 at 14:39
  • @knbk Indeed, I found that out the hard way :) Thanks! – traceback Jul 15 '16 at 23:02