3

I have a particular model that I'd like to perform custom validations on. I'd like to guarantee that at least one identifier field is always present when creating a new instance such that its impossible to create an instance without one of these fields, though no field in particular is individually required.

from django.db import models

class Security(models.Model):
    symbol = models.CharField(unique=True, blank=True)
    sedol = models.CharField(unique=True, blank=True)
    tradingitemid = models.Charfield(unique=True, blank=True)

I'd like a clean, reliable way to do this no matter where the original data is coming from (e.g., an API post or internal functions that get this data from other sources like a .csv file).

I understand that I could overwrite the models .save() method and perform validation, but best practice stated here suggests that raising validation errors in the .save() method is a bad idea because views will simply return a 500 response instead of returning a validation error to a post request.

I know that I can define a custom serializer with a validator using Django Rest Framework for this model that validates the data (this would be a great solution for a ModelViewSet where the objects are created and I can guarantee this serializer is used each time). But this data integrity guarantee is only good on that API endpoint and then as good as the developer is at remembering to use that serializer each and every time an object is created elsewhere in the codebase (objects can be created throughout the codebase from sources besides the web API).

I am also familiar with Django's .clean() and .full_clean() methods. These seem like the perfect solutions, except that it again relies upon the developer always remembering to call these methods--a guarantee that's only as good as the developer's memory. I know the methods are called automatically when using a ModelForm, but again, for my use case models can be created from .csv downloads as well--I need a general purpose guarantee that's best practice. I could put .clean() in the model's .save() method, but this answer (and related comments and links in the post) seem to make this approach controversial and perhaps an anti-pattern.

Is there a clean, straightforward way to make a guarantee that this model can never be saved without one of the three fields that 1. doesn't raise 500 errors through a view, 2. that doesn't rely upon the developer explicitly using the correct serializer throughout the codebase when creating objects, and 3. Doesn't rely upon hacking a call to .clean() into the .save() method of the model (a seeming anti-pattern)? I feel like there must be a clean solution here that isn't a hodge podge of putting some validation in a serializer, some in a .clean() method, hacking the .save() method to call .clean() (it would get called twice with saves from ModelForms), etc...

user1847
  • 3,571
  • 1
  • 26
  • 35

1 Answers1

3

One could certainly imagine a design where save() did double duty and handled validation for you. For various reasons (partially summarized in the links here), Django decided to make this a two-step process. So I agree with the consensus you found that trying to shoehorn validation into Model.save() is an anti-pattern. It runs counter to Django's design, and will probably cause problems down the road.

You've already found the "perfect solution", which is to use Model.full_clean() to do the validation. I don't agree with you that remembering this will be burdensome for developers. I mean, remembering to do anything right can be hard, especially with a large and powerful framework, but this particular thing is straightforward, well documented, and fundamental to Django's ORM design.

This is especially true when you consider what is actually, provably difficult for developers, which is the error handling itself. It's not like developers could just do model.validate_and_save(). Rather, they would have to do:

try:
    model.validate_and_save()
except ValidationError:
    # handle error - this is the hard part

Whereas Django's idiom is:

try:
    model.full_clean()
except ValidationError:
    # handle error - this is the hard part
else:
    model.save()

I don't find Django's version any more difficult. (That said, there's nothing stopping you from writing your own validate_and_save convenience method.)

Finally, I would suggest adding a database constraint for your requirement as well. This is what Django does when you add a constraint that it knows how to enforce at the database level. For example, when you use unique=True on a field, Django will both create a database constraint and add Python code to validate that requirement. But if you want to create a constraint that Django doesn't know about you can do the same thing yourself. You would simply write a Migration that creates the appropriate database constraint in addition to writing your own Python version in clean(). That way, if there's a bug in your code and the validation isn't done, you end up with an uncaught exception (IntegrityError) rather than corrupted data.

Kevin Christopher Henry
  • 46,175
  • 7
  • 116
  • 102
  • Great answer, @Kevin--thank you! Looks like the proper use of the Djanjo `try/except ValidationError/else save` idiom is a big part of the solution. To get the database guarantee that I can't write corrupted data do you think it's better to write a custom migration or simply add the validation logic to the .save() method where it's explicit in the model code rather than buried in a migration? Or is there a way to use Django to write this constraint outside of a migration? My fear is that no one looks at migrations so it might not be the best place for the code. Thanks for your great answer! – user1847 Jan 12 '18 at 20:01
  • Trying to understand the best way to guarantee database level integrity here. Answering my own comment it looks like adding validation in the `.save()` method isn't optimal because `.save()` isn't called on `queryset.update()` so I'd still be vulnerable to collecting invalid data. (Notably, the Django Admin interface uses `queryset.update()` so that's a huge open door for invalid data if validating at the `.save()` level.) Could you post and explain a custom django migration for database level integrity constraints for my model? – user1847 Jan 12 '18 at 20:31
  • @ColtonHicks: Sorry if I wasn't clear. Feel free to ignore the last paragraph. The answer to your question is to put the validation in `clean()`, *not* in `save()`. What I'm saying in the last paragraph is that if your rule is one that can be expressed at the database level (with a SQL constraint) then it can be useful to *also* do that. It provides additional protection for your data, works with any framework, can provide performance improvements, etc. A database constraint is part of the database definition (hence a migration), it's not Python and is not done on each save, – Kevin Christopher Henry Jan 12 '18 at 22:07
  • Last paragraph is super helpful! Your answer suggests two parts to the data integrity guarantee I want: 1. Using `try: model.full_clean(), except: ValidationError, else: .save()` pattern throughout the code. 2. Implement a database level constraint that forbids saving my model with one of these fields blank (in case data gets entered outside of my code pattern, say the Django admin interface, for example). If you could, I'd appreciate a code example for your last paragraph showing how to implement this database level constraint--I think with that you'd have the question fully covered :) – user1847 Jan 12 '18 at 23:03
  • @ColtonHicks: I found [this blog post](https://www.fusionbox.com/blog/detail/custom-database-constraints-in-django/594/) that explains exactly what I'm talking about. If you can't figure out the SQL for your specific constraint I suggest posting a new question about that and tagging your database (since the syntax can vary from one to the other). – Kevin Christopher Henry Jan 12 '18 at 23:21
  • Good point. I read the same blog post and was hoping for an example in code here for reference, but you're right--that code will vary by database. Better for a separate question if needed. Thanks again! – user1847 Jan 12 '18 at 23:26