How to recognise text file from my linux pc via django code without checking its extension and also its file size?

Question

Most of the time when we create a new text file with gedit in linux then the file is not saved with an extension of .txt for text file.So how will I recognise it with django code because here I can't check file extension.Here is my code...

Let's say i have a resume field for each user in following models.py

class User(AbstractUser):

resume= models.FileField( upload_to=get_attachment_file_path,default=None, null=True,validators=[validate_file_extension])

Now i want to Validate the file for allowed extension so I made a validators.py as below

def validate_file_extension(fieldfile_obj):

    megabyte_limit = 5.0 
    filesize = sys.getsizeof(fieldfile_obj)
    ext = os.path.splitext(fieldfile_obj.name)[1]  
    print("extensionnnnnnnnnnnnn",ext)
    valid_extensions = ['.pdf', '.doc', '.docx', '.jpg', '.png', '.xlsx', '.xls','.txt','.odt']

    if not ext.lower() in valid_extensions:
        raise ValidationError(u'Unsupported file extension.')

    elif filesize > megabyte_limit*1024*1024:

        raise ValidationError("Max file size is %s Byte" % str(megabyte_limit))

Now whenever I upload a text file in my api then it says unsupported file type because the code is unable to get the extension of linux text file.So how can i recognise that text file which is not saved as demo.txt instead my text file is saved as only demo but it is text file as seen from property of that file.

Also my next question is to get the size of each file uploaded in that FileField.I am using PostgreSQL as Dbms

@amrit You can do `import os` then do `os.path.getsize('sample_file.extension')` and it returns the size in bytes. — Eddie, Dec 29 '16 at 07:55

score 3 · Answer 1 · edited May 23 '17 at 10:30

You probably want to detect the upload's MIME type regardless of file extension, and that's often done by reading the file header to detect "magic numbers" or other bit patterns indicating the true nature of a file. Often text files are an edge case, where no header is detected and the first x bytes are printable ASCII or Unicode.

While that's a bit of a rabbit hole to dive into, there's a few Python libraries that will do that for you. For example: https://github.com/ahupp/python-magic will work for your needs by simply inferring the mime type per the file contents, which you will then match against the types you want to accept.

A somewhat related set of example code specific to your needs can be found here: https://stackoverflow.com/a/28306825/7341881

Edit: Eddie's solution is functionality equivalent; python-magic wraps libmagic, which is what Linux's native "file" command taps into. If you do decide to go the subprocess route, do be extra careful you're not creating a security vulnerability by improperly sanitizing user input (eg the user's provided filename). This could lead to an attack granting arbitrary access to your server's runtime environment.

Eddie · Answer 2 · 2016-12-26T11:09:05.127

Easy 3 line solution with no external dependencies.

import subprocess

file_info = subprocess.getoutput('file demo')
print(file_info)

In POSIX systems (Linux, Unix, Mac, BSD etc) you can use a file command, for example file demo will display the file info even if the file extension is not explicitly set.

demo is the argument for the file command in other words the actual file you are trying to detect.

Disclaimer, be extra careful running external commands.

Please follow this link for more info about the Python subprocess module. https://docs.python.org/3.6/library/subprocess.html

How to recognise text file from my linux pc via django code without checking its extension and also its file size?

2 Answers2