28

Is there a way to get the content type of an upload file when overwriting the models save method? I have tried this:

def save(self):
    print(self.file.content_type)
    super(Media, self).save()

But it did not work. In this example, self.file is a model.FileField:

file = models.FileField(upload_to='uploads/%m-%Y/')

Edit: I want to be able to save the content type to the database, so I'll need it before the save is actually complete :)

Hanpan
  • 10,013
  • 25
  • 77
  • 115
  • I think the file is saved AFTER the save() is done. I could be wrong thought. So try flipping the 2nd and 3rd line around, so save() then print(). – dotty Jan 31 '11 at 16:50
  • That worked, but I'm going to need to get the data before hand as I want to save the content type to the database. I should have mentioned that in the original question. – Hanpan Jan 31 '11 at 16:56
  • So save it, get the mime type, fill your content type field, then super(...).save again. Should work just like updating. – Spacedman Jan 31 '11 at 17:43
  • Out of interest, would this hit the database twice? – Hanpan Jan 31 '11 at 17:45
  • Yes, it would hit the DB twice. – dotty Feb 09 '11 at 16:52

4 Answers4

29
class MyForm(forms.ModelForm):

    def clean_file(self):
        file = self.cleaned_data['file']
        try:
            if file:
                file_type = file.content_type.split('/')[0]
                print file_type

                if len(file.name.split('.')) == 1:
                    raise forms.ValidationError(_('File type is not supported'))

                if file_type in settings.TASK_UPLOAD_FILE_TYPES:
                    if file._size > settings.TASK_UPLOAD_FILE_MAX_SIZE:
                        raise forms.ValidationError(_('Please keep filesize under %s. Current filesize %s') % (filesizeformat(settings.TASK_UPLOAD_FILE_MAX_SIZE), filesizeformat(file._size)))
                else:
                    raise forms.ValidationError(_('File type is not supported'))
        except:
            pass

        return file

settings.py

TASK_UPLOAD_FILE_TYPES = ['pdf', 'vnd.oasis.opendocument.text','vnd.ms-excel','msword','application',]
TASK_UPLOAD_FILE_MAX_SIZE = "5242880"
Nathan Osman
  • 71,149
  • 71
  • 256
  • 361
moskrc
  • 1,220
  • 1
  • 12
  • 23
  • clean_file is generally called when you need to validate the form with is_valid() – Steve K Nov 23 '12 at 03:05
  • 8
    I think the except should catch an AttributeError. Otherwise, won't the forms.ValidationError be swallowed up? – Joe J Mar 08 '14 at 15:41
  • 3
    Do note that [`UploadedFile.content_type`](https://docs.djangoproject.com/en/2.1/ref/files/uploads/#django.core.files.uploadedfile.UploadedFile.content_type) comes from the user. It's generally best to use [`python-magic`](https://pypi.org/project/python-magic/), or built-in [`imghdr`](https://stackoverflow.com/a/16252722/52499). – x-yuri Apr 06 '19 at 13:14
  • 5
    Downvote: please do NOT promote trusting user data **user data can NEVER be trusted**. ALWAYS check the type of the file from SERVER SIDE SOURCE. – jave.web Oct 27 '20 at 14:18
  • 3
    This is an antipattern and should be removed as a valid answer. The requester can pass in any header they want. The file could be executable, but the header could state it's pdf and it would work just fine. – JackLeo Jan 20 '22 at 17:37
24

You can use PIL or magic to read the few first bytes and get the MIME type that way. I wouldn't trust the content_type since anyone can fake an HTTP header.

Magic solution below. For a PIL implementation you can get an idea from django's get_image_dimensions.

import magic


def get_mime_type(file):
    """
    Get MIME by reading the header of the file
    """
    initial_pos = file.tell()
    file.seek(0)
    mime_type = magic.from_buffer(file.read(2048), mime=True)
    file.seek(initial_pos)
    return mime_type

File is the in-memory uploaded file in the view.

Pithikos
  • 18,827
  • 15
  • 113
  • 136
  • 3
    As a minor helper here - if you're running a Python image in Docker you'll need to make sure libmagic1 is installed at the Dockerfile level when the initial dependencies are installed. Otherwise python-magic will try to wrap a system-level library that hasn't been installed. You can do this simply enough with: 'RUN apt-get update \ && apt-get install -y curl libmagic1 \ && apt-get -y autoclean' or whatever other stuff you're running at that point does. – thms Jul 26 '21 at 14:33
  • in readme, magic library recommends using first 2048 bytes "as less can produce incorrect identification" – suayip uzulmez Jul 25 '22 at 09:47
4

I'm using Django Rest Framework and this is the simplest way to determine content type/mime type:

file = request.data.get("file")    # type(file) = 'django.core.files.uploadedfile.InMemoryUploadedFile'
print(file.content_type)

Let's say I have uploaded a JPEG image then my output would be:

image/jpeg

Let me know in the comments if this serves your purpose.

Anuj Gupta
  • 576
  • 5
  • 9
  • 3
    This does **not** check the mime type of the file but the `Content-Type` header of the request. "Checking" a file this way will result in the same "mime type" whatever you upload, as long as the header stays the same, and trusting requests blindly (which can easily be forged), potentially letting users upload files that are harmful or not allowed. – Shin Feb 02 '21 at 06:36
  • 2
    thanks bro Anuj this was too helpful for an integration with the goolge drive api – Juan Diego Ramirez Apr 10 '21 at 00:07
0

Need to override the save method in the model class

def save(self, *args, **kwargs):
    if self.file and self.file.file:
        try:#Need to add a try catch such that in case a file is not being uploaded, then the mime_type is not assigned
            self.mime_type=self.file.file.content_type
        except:
            pass

Taking an assumption that our model has file column(FileField), and mime_type column (CharField)

NINSIIMA WILBER
  • 159
  • 1
  • 7