6

I'm already checking for content-type, size, and extension (Django (audio) File Validation), but I need a library to read the file and confirm that it is in fact what I hope it is (mp3 and mp4 mostly).

I've been here: http://wiki.python.org/moin/Audio/ but no luck. Been at this one for a while, am a bit lost in the woods. Relying on SO big time for this whole end of things...

Thanks in advance.

EDIT: I'm already (in Django) using UploadedFile.content_type() :

"The content-type header uploaded with the file (e.g. text/plain or application/pdf). Like any data supplied by the user, you shouldn't trust that the uploaded file is actually this type. You'll still need to validate that the file contains the content that the content-type header claims -- "trust but verify."

So, I'm already reading the header. But how can I validate the actual content of the file?

Community
  • 1
  • 1
Matt Parrilla
  • 3,171
  • 6
  • 35
  • 54

3 Answers3

2

You can call a unix sub-shell within python like this:

>>> filename = 'Giant Steps.mp3'
>>> import os
>>> type = os.system('file %s' % filename)
Giant Steps.mp3: ISO Media, MPEG v4 system, iTunes AAC-LC

** See man pages for more details on the 'file' command if you want to go this route.

See this post for other options


Community
  • 1
  • 1
Alex Gaudio
  • 1,894
  • 1
  • 16
  • 13
  • This seems like it wouldn't be a reliable cross-platform solution, but I could be wrong. – dkamins Jul 01 '11 at 01:28
  • This method just checks the file header as far as I can tell. I'm looking for an actual library that confirms that the file is an audio file. – Matt Parrilla Jul 02 '11 at 16:25
  • 3
    As Daenyth said, Mutagen is a great audio library. I think that the fear in validating audio for django is if someone uploaded a malicious file, they could potentially execute arbitrary code. For instance, when/if someone plays the malicious file through your website audio player, your audio player should be secure enough not to crash or execute arbitrary code. – Alex Gaudio Jul 03 '11 at 15:45
2

If just checking the header isn't good enough, I'd recommend using mutagen to load the file. It should throw an exception if it's not correct.

FYI, I do not think your approach is very scalable. Is it really necessary to read every byte of the file? What is your reason for not trusting the file header?

Daenyth
  • 35,856
  • 13
  • 85
  • 124
  • 1
    I'm not sure if it's necessary to be honest. I'm a newb and am just trying to follow what seems like a strong suggestion from the django docs. This website will be available to anyone and everyone so security **is** potentially an issue (no idea who user is), with that said, if there are better/other ways to go about this, or if it's just unnecessary, I'd definitely value any input above and beyond my explicit question. – Matt Parrilla Jul 02 '11 at 19:48
0

Use sndhdr

It does a little more than content-type. Reads the file and gets it's headers..of course this is still not foolproof..using ffmpeg is probably then the only option.

Ambika Sukla
  • 244
  • 4
  • 3