5

I have a piece of code that handles file uploads for me, and ideally I want to accept only text files (csv, tab delimited files, etc.) So I added this chunk of code:

mimetype = magic.from_buffer(request.FILES['docfile'].read(512), mime=True)
if form.is_valid() and mimetype == 'text/plain':
     ....

Just recently one of my users tried uploading a text file and the system rejected it, the mime for that file is:

file --mime-type -b input_file.txt 
application/octet-stream

And of course, all of the previously uploaded files have been text/plain. What's the difference between these two? Is there a more "global" way to check if a file is a text file?

Stupid.Fat.Cat
  • 10,755
  • 23
  • 83
  • 144

1 Answers1

1

I found this answer which is probably relevant:

Yet another method based on file(1) behavior:

textchars = bytearray({7,8,9,10,12,13,27} | set(range(0x20, 0x100)) - {0x7f})
is_binary_string = lambda bytes: bool(bytes.translate(None, textchars))

Example:

is_binary_string(open('/usr/bin/python', 'rb').read(1024))
True
is_binary_string(open('/usr/bin/dh_python3', 'rb').read(1024))
False
Community
  • 1
  • 1
Jacques Gaudin
  • 15,779
  • 10
  • 54
  • 75
  • Would the proper way to handle this to accept both binary and text files then? I ask this because I can technically add application/octet-stream as another mime type – Stupid.Fat.Cat Feb 03 '17 at 19:58
  • If you go down that route, yes you need to accept both and test them after. I don't know much about MIMEtypes appart from gtk but it doesn't seem to be 100% reliable. – Jacques Gaudin Feb 03 '17 at 20:08