23

I'm experimenting with a site that will allow users to upload audio files. I've read every doc that I can get my hands on but can't find much about validating files.

Total newb here (never done any file validation of any kind before) and trying to figure this out. Can someone hold my hand and tell me what I need to know?

As always, thank you in advance.

Matt Parrilla
  • 3,171
  • 6
  • 35
  • 54
  • what do you mean validate? Do you want to just validate that the user has upload any old file? Or validate that the file is actually an audio file? Or go further an validate a particular attribute of the audio file, i.e. it's length? – Timmy O'Mahony Jun 01 '11 at 00:08
  • 1
    I mean it in the following sense (from django docs): "Note that whenever you deal with uploaded files, you should pay close attention to where you're uploading them and what type of files they are, to avoid security holes. Validate all uploaded files so that you're sure the files are what you think they are. For example, if you blindly let somebody upload files, without validation, to a directory that's within your Web server's document root, then somebody could upload a CGI or PHP script and execute that script by visiting its URL on your site. Don't allow that." – Matt Parrilla Jun 01 '11 at 00:51
  • ok, have you got the file upload working to start with? – Timmy O'Mahony Jun 01 '11 at 00:53
  • Only in admin at this point. I thought I would set up validation before I created the form for the user to upload their file and test it myself first. If I'm going about it backwards let me know! – Matt Parrilla Jun 01 '11 at 01:58

1 Answers1

27

You want to validate the file before it gets written to disk. When you upload a file, the form gets validated then the uploaded file gets passed to a handler/method that deals with the actual writing to the disk on your server. So in between these two operations, you want to perform some custom validation to make sure it's a valid audio file

You could:

  • check if the the file is less then a certain size (good practice)
  • then check if the submitted file has a certain content type (i.e. an audio file)
    • this is pretty useless as someone could easily spoof it
  • then check that the file ends in a certain extension (or extensions)
    • this is also pretty useless
  • try read the file and see if it's actually audio

(I haven't tested this code)

models.py

class UserSong(models.Model):
    title = models.CharField(max_length=100)
    audio_file = models.FileField()

forms.py

class UserSongForm(forms.ModelForm):
     # Add some custom validation to our file field
     def clean_audio_file(self):
         file = self.cleaned_data.get('audio_file',False):
         if file:
             if file._size > 4*1024*1024:
                   raise ValidationError("Audio file too large ( > 4mb )")
             if not file.content-type in ["audio/mpeg","audio/..."]:
                   raise ValidationError("Content-Type is not mpeg")
             if not os.path.splitext(file.name)[1] in [".mp3",".wav" ...]:
                   raise ValidationError("Doesn't have proper extension")
             # Here we need to now to read the file and see if it's actually 
             # a valid audio file. I don't know what the best library is to 
             # to do this
             if not some_lib.is_audio(file.content):
                   raise ValidationError("Not a valid audio file")
             return file
         else:
             raise ValidationError("Couldn't read uploaded file")

views.py from utils import handle_uploaded_file

def upload_file(request):
    if request.method == 'POST':
        form = UserSongForm(request.POST, request.FILES)
        if form.is_valid():
            # If we are here, the above file validation has completed
            # so we can now write the file to disk
            handle_uploaded_file(request.FILES['file'])
            return HttpResponseRedirect('/success/url/')
    else:
        form = UploadFileForm()
    return render_to_response('upload.html', {'form': form})

utils.py

# from django's docs
def handle_uploaded_file(f):
    ext = os.path.splitext(f.name)[1]
    destination = open('some/file/name%s'%(ext), 'wb+')
    for chunk in f.chunks():
        destination.write(chunk)
    destination.close()

https://docs.djangoproject.com/en/dev/topics/http/file-uploads/#file-uploads https://docs.djangoproject.com/en/dev/ref/forms/fields/#filefield https://docs.djangoproject.com/en/dev/ref/files/file/#django.core.files.File

Timmy O'Mahony
  • 53,000
  • 18
  • 155
  • 177
  • 1
    This is a complete answer to file validation in Django. Basically, you'll always need to provide access to some library which can parse the content header or open the file to determine whether the content you received is of the type your expecting it to be. That's what the `(bool) some_lib.is_audio(file.content)` bit does, which is what you'll need to find a suitable implementation for. – Filip Dupanović Jun 01 '11 at 08:03
  • WOW! Thank you, thank you, thank you. This is beyond what I had hoped for. Pastylegs, you are my hero! – Matt Parrilla Jun 01 '11 at 17:32
  • Filip, is it enough to parse the header? UploadedFile.content_type() is already doing this. – Matt Parrilla Jul 02 '11 at 17:56
  • In the reply above you need to import os and content-type should be content_type. – user1140929 Jan 10 '12 at 13:37
  • 1
    As a corollary to this answer, [I've written a blog post that discusses image file validation when uploading an image via an URL](http://timmyomahony.com/blog/upload-and-validate-image-from-url-in-django/) – Timmy O'Mahony Jun 23 '14 at 11:27
  • `UploadedFile.content_type()` returns whatever content-type was supplied in the HTTP request, so that's not reliable at all. The best library to check the actual type is [libmagic/python-magic](https://github.com/ahupp/python-magic) in conjunction with checking the file extension, as shown in this answer. – nitely Feb 09 '18 at 04:07