What is the proper way to catch the duplicate key error?

Question

Initially I thought being as specific as possible is the best way to go about it:

from django.db import IntegrityError
from psycopg2.errorcodes import UNIQUE_VIOLATION

try:
    m = Model.objects.create(value=m2.old_value)
except IntegrityError as e:
    if e.__cause__.pgcode != UNIQUE_VIOLATION:
        raise
    m = Model.objects.get(value=m2.old_value)

However later I ran into the answer that says:

SQLAlchemy is an abstraction library and if it were to raise exceptions specific to the underlying dpapi adapter, it would significantly reduce the portability of code written within it. ... However, unless your error handling is very specific to the type raised by the dbapi layer, I'd suggest that inspecting the underlying type might be unnecessary as the SQLAlchemy exception that is raised will provide enough runtime information to do what you have to do.

django.db.IntegrityError is supposedly raised in the following cases:

INTEGRITY_CONSTRAINT_VIOLATION = '23000'
RESTRICT_VIOLATION = '23001'
NOT_NULL_VIOLATION = '23502'
FOREIGN_KEY_VIOLATION = '23503'
UNIQUE_VIOLATION = '23505'
CHECK_VIOLATION = '23514'
EXCLUSION_VIOLATION = '23P01'

I'm not sure under which circumstances these particular errors occur, but the general rule is, "don't ignore exceptions." And if I choose:

try:
    m = Model.objects.create(value=m2.old_value)
except IntegrityError as e:
    m = Model.objects.get(value=m2.old_value)

It sounds like I might ignore an exception I didn't mean to ignore. What would you suggest?

CryptoFool · Answer 1 · 2020-12-27T08:12:05.490

1

I don't see any problem with doing what you're doing. There's certainly no reason not to add code to get a clearer idea of what kinds of errors get thrown and are catchable under various conditions. If you're not sure if this code is right as is, then it's better to be too specific rather than not specific enough. After you've run and tested your code for a while, you'll know more about the "correctness" of what you're doing here. Maybe you'll decide later to make a change. Maybe you'll decide that you can treat larger groups of exceptions in a similar way.

One important thought though... You' aren't ignoring any errors here. There's a difference between ignoring an error, which means doing nothing different when one occurs than when it does not, and catching an error and then doing something different because that error occurred. That's not ignoring the error at all, and that's what you're doing here. So the question here is "is doing a get under all cases where a create throws an IntegrityError exception the right thing to do?". If the answer is "No", then that's where you have to test for specific error conditions and decide if the right thing is to do a get or let the error propagate because something is really wrong. That's just what you're doing. If the code you present is doing that correctly, then maybe it's exactly what you want.

Bottom line...do whatever you have to do to get the code to act right in all cases. If doing that means digging into the exception the way you are, so be it.

I'm curious...what are your thoughts on not allowing an exception to be thrown at all? That is, what about checking for the pre-existence of the record before attempting the create? Is there a reason you can't do this? Also, what about reversing the two operations? Might it make sense to do the get first, and then do the create if that fails? I'm only presenting these other options to explore all the options here, both for your benefit and that of the question itself.

edited Dec 27 '20 at 08:12

answered Dec 27 '20 at 07:50

CryptoFool

21,719
5
26
44

There is a reason not to check the code. The code is database-specific. If I check the code I tie myself up to `pg`, if I don't, I risk—okay, not ignoring—handling errors I didn't expect to handle. By the way, by [ignoring an exception](https://stackoverflow.com/q/28659462/52499) I meant doing nothing in the handler. Now then, why do I not do `get()`, followed by `create()` if the object doesn't exist? To avoid [race conditions](https://gist.github.com/x-yuri/da90370b106d9815064b99a1f9ce8030). It's a common idiom. And now that I look into it, there's already... – x-yuri Dec 27 '20 at 11:35
...the [`get_or_create()`](https://docs.djangoproject.com/en/3.1/ref/models/querysets/#get-or-create) method in Django. The funny thing is that they [`get()`, then `create()`](https://github.com/django/django/blob/3.1.4/django/db/models/query.py#L572-L576). A bug, or am I missing something? "Is doing `get()` under all cases where `create()` throws `IntegrityError` the right thing to do?" That's exactly the question I'm trying to find an answer for. And judging by the existence of the `get_or_create()` method, the answer exist. That is one option is always better than the other... – x-yuri Dec 27 '20 at 11:35
...Or at least better most of the time. – x-yuri Dec 27 '20 at 11:35
Um, I got carried away. My main question is, "Should I use the vendor-specific stuff?" The question "What to do first, get() or create()?" was brought up by you. But I'm also curious about the answer, since I thought doing INSERT first is common ground. – x-yuri Dec 27 '20 at 11:47
I'm sorry I got you sidetracked on the calling order, pre-checking and such. Race-condition is a fine reason to avoid that. 'nuff said there. Back to the main question...it seems strange to me that the SqlAlchemy API provides no way to isolate the error case you're interested in without going database-specific. But if that's the case, then what choice do you have? If there is no cross-db way to distinguish the error, I don't see a big problem with adding a db-specific check. You could add an assertion on the db type and put clear documentation at the assert site explaining the issue... – CryptoFool Dec 27 '20 at 12:12
...do you anticipate a db change, or the need to support multiple db backends? We all talk about keeping our code general to avoid some sort of lock-in, but we also all seldom swap out whatever it is we're talking about. IMO, it's not worth spending the time worrying about it as long as if/when a change happens, it doesn't lead to confusion or extra work. If you make sure that the issue is flagged when a change is made, and the reason for and solution to the problem is well documented in the code, then I don't a problem with using db-specific code, especially if that's all you've got. – CryptoFool Dec 27 '20 at 12:19
Actually this "what to do first" thing suggests the answer. The answer is `get_or_create()`. And looking more carefully, it doesn't do `get()` first. Well, it does, but then it does `create()`. And if the latter fails it does `get()` again. And as you can see it ignores the [database-specific code](https://github.com/django/django/blob/3.1.4/django/db/models/query.py#L612). In case I misguided you, I'm not using SQLAlchemy. The particular ORM was not the reason I quoted that SQLAlchemy answer. – x-yuri Dec 27 '20 at 15:00

What is the proper way to catch the duplicate key error?

1 Answers1