2

The problem: a signal receiver checks to see if a model entry exists for certain conditions, and if not, it creates a new entry. In some rare circumstances, the entry is being duplicated.

Within the receiver function:

try:
    my_instance = MyModel.objects.get(field1=value1, field2=sender)
except:
    my_instance = MyModel(field1=value1, field2=sender)
    my_instance.save()

It's an obvious candidate for get_or_create, but aside from cleaning up that code, would using get_or_create help prevent this problem?

The signal is sent after a user action, but I don't believe that the originating request is being duplicated because that would have trigged other actions.

The duplication has occurred a few times in thousands of instances. Is this necessarily caused by multiple requests or is there some way a duplicate thread could be created? And is there a way - perhaps with granular transaction management - to prevent the duplication?

Using Django 1.1, Python 2.4, PostgreSQL 8.1, and mod_wsgi on Apache2.

bennylope
  • 1,113
  • 2
  • 13
  • 24
  • Deleting my answer since I'm not addressing thread safety. Just pointing out that with your setup, if `MyModel` ever gets a duplicate created via any means, it will continually produce duplicates when `get` returns a `MultipleObjectsReturned` exception. – Yuji 'Tomita' Tomita Mar 11 '11 at 22:35

2 Answers2

2

to prevent signals duplication add a "dispatch_uid" parameter to the signal attachment code as described in the docs.

make sure that you have a transaction opened - otherwise it may happen, that between checking (objects.get()) and cration (save()) state of the table changes.

Jerzyk
  • 21
  • 2
  • Awesome - not sure why but it's not in the Django 1.1 docs, although it's certainly in the Django 1.1 dispatch module. Going to try this. – bennylope Mar 11 '11 at 23:59
1

Perhaps this answer may help. Apparently, a transaction is properly used with get_or_create but I've not confirmed this. mod_wsgi is multi-process and multi-threaded (both configurable), which means that race conditions can definitely occur. What I guess is happening in your application is that two separate requests are launched that will generate the same value for field1, and it just so happens that they execute with just the right timing to add 'duplicate' entries.

If the combination of MyModel(field1=value1, field2=sender) must be unique, then define a unique_together constraint on your model to further aide in integrity.

Community
  • 1
  • 1
Josh Smeaton
  • 47,939
  • 24
  • 129
  • 164
  • Thanks for reminding me. Yes, the constraint would solve the end problem and be a good idea for general data integrity, but of course would like to avoid changing the production database if it can be avoided. – bennylope Mar 12 '11 at 00:03
  • @benny, understandable, but if the constraint is valid at the data level, then it should be applied. Mistakes in the data model should be corrected as soon as possible. However, the `get_or_create` should serve your purposes in the mean time with the use of a transaction. – Josh Smeaton Mar 12 '11 at 00:06