2

This is a beginner question. I'm working on a website that allows users to upload a video to a project model (via ModelForm) and I want to validate this file correctly. I originally declared the field in this way:

from django.db import models
from django.core.validators import FileExtensionValidator

def user_directory_path(instance, filename):
    """
    Code to return the path
    """

class Project(models.Model):
    """
    """
    # ...Some model fields...
    # Right now I'm only validating and testing with .mp4 files.
    video_file = models.FileField(
        upload_to=user_directory_path,
        validators=[FileExtensionValidator(allowed_extensions=['mp4'])]
    )

But I read in several places that it was better to use libmagic to check the file's magic numbers and make sure its contents match the extension and the MIME type. I'm very new to this, so I might get some things wrong.

I followed the validators reference to write a custom validator that uses magic. The documentation also talks about "a class with a __cal__() method," and the most upvoted answer here uses a class-based validator. The documentation says that this can be done "for more complex or configurable validators," but I haven't understood what would be a specific example of that and if my function-based validator is just enough for what I'm trying to do. I think it is, but I don't have experience to be sure.

This is what I have.

models.py

from django.db import models
from .validators import validate_media_file

def user_directory_path(instance, filename):
    """
    Code to return the path
    """

class Project(models.Model):
    """
    """
    # ...Some model fields...
    # Right now I'm only validating and testing with .mp4 files.
    video_file = models.FileField(
        upload_to=user_directory_path,
        validators=[validate_media_file]
    )

validators.py (basically taken from the example in the documentation)

import os
import magic

from django.core.exceptions import ValidationError
from django.utils.translation import gettext_lazy as _

def validate_media_file(value):
    """
    """
    # Upper case to then see if it's in magic.from_buffer() output
    file_extension = os.path.splitext(value.name)[1].upper()[1:]

    # A list because later I will validate other formats
    if file_extension not in ['MP4']:
        raise ValidationError(
            _('File %(value)s does not contain a valid extension'),
            params={'value': value},
        )

    elif file_extension not in magic.from_buffer(value.read()):
        raise ValidationError(
            _(<appropriate error message>),
            params={'value': value},
        )

The migration worked with this. I also tested it with a plain text file with a .mp4 extension and then with a different file (and extension) and it works. However, I'd like to know if I'm missing something by using this instead of a class-based validator, and also, as the title says, when should I use one because I might come across another situation in which I'd need to know it.

I know I haven't included the MIME type; I can do it later.

As an additional question, what would be an appropriate error message for when the output from magic.from_buffer() does not match the extension and/or MIME type? I thought about something saying the "file is corrupt," but I'm not sure. Actually, is this the output that's directly based on the magic numbers?

TheSprinter
  • 338
  • 1
  • 12
  • 1
    You would use a class based validator if you need the advantages of classes - inheritance and so on. If you know that you don't need that (for now), your function based validator is enough. Basically, a class with the magic method "__call__()" provides an interface which a function normally does. You just can call an object of this class like you would call your function. If you notice later, that you need this functionality in different modifications elsewhere, its no big deal to transform it into a class. – Yves Hary Oct 12 '20 at 11:17

1 Answers1

1

When to use class based validator?

In your example a function based validator is sufficient. If you ever need the advantages of OOP, classes and objects, then you should switch to a class based validator. Imagine the following very fictional source code:

class StartsWithValidator():
    def __init__(self, starts_with):
        self.starts_with = starts_with

    def __call__(self, value):
        if not str(value).startswith(self.starts_with):
            raise ValidationError(
                'Your string does not start with: {}!'.format(self.starts_with),
                params={'value': value}
            )

my_validator = StartsWithValidator('123')
test_string = '123OneTwoThree'
my_validator(test_string) # Will it pass the validator?

You can see here different qualities:

  1. With a class based validator, you can use objects. Objects share the same functionality with a different inner state. You now can set up a validator, which checks if the string starts with 'abc', '123', whatever without writing new code
starts_with_abc = StartsWithValidator('abc')
starts_with_123 = StartsWithValidator('123')
starts_with_whatever = StartsWithValidator('whatever')
  1. You can use inheritance. Imagine you want to reuse the starts-with-validation among with other features, you simply inherit from the `StartsWithValidator'-class.
class StartsWithABCValidator(StartsWithValidator):
    def __init__(self):
        super().__init__('ABC')

    def __call__(self, value):
        super().__call__(value)
  1. If your validator does a lot of complex things, a simple function can lead to bad readable code. If you use classes, you are able to encapsulate your functionality and group it together.
Yves Hary
  • 302
  • 1
  • 9
  • So, to see if I understand. In my case, I could actually use a class-based validator and instantiate it with each file format like `MediaFormatValidator('MP4')`, `MediaFormatValidator('AVI')`, etc., and run the same code in the `__call__()` method, provided that `magic.from_buffer()` includes this format in its output string—as it does with the MP4 format—right? Although, even if this is correct, using a list to see if the extension is in it, could still be enough. – TheSprinter Oct 12 '20 at 22:40
  • Yes, you got this right. And also I would recommend to use just a simple function until you reach the limitation of such an implementation. You notice it, when your code has too much if-statements, repeats itself (beware of Ctrl+C/Ctrl+V) or gets too big. There are a lot of other code smells, but you will find your way :-) – Yves Hary Oct 13 '20 at 06:22
  • Great. Yes, I can see it's an overkill in this case. – TheSprinter Oct 13 '20 at 09:18