11

I have a list of filenames as follows

files = [
    '/dl/files/4j55eeer_wq3wxxpiqm.jpg',
    '/home/Desktop/hjsd03wnsbdr9rk3k',
    'kd0dje7cmidj0xks03nd8nd8a3',
    ...
]

The problem is most of the files do not have an extension in the filenames, what would be the best way to get file extension of these files ?

I don't know if this is even possible because python would treat all files as buffer or string objects that do not have any filetype associated with them.

can this be done at all ?

Amyth
  • 32,527
  • 26
  • 93
  • 135
  • 1
    http://stackoverflow.com/questions/14412211/get-mimetype-of-file-python but that will give you type not the extension though – 0xAli Jun 01 '13 at 11:25
  • Should this be portable? UNIX's `file` is usually really good at finding the file types... – Bakuriu Jun 01 '13 at 11:25
  • 1
    @Bakuriu: `file` uses `libmagic`, and that is cross-platform. See the `python-magic` library John Zwinck links to below. – Martijn Pieters Jun 01 '13 at 11:27

3 Answers3

16

Once you use magic to get the MIME type, you can use mimetypes.guess_extension() to get the extension for it.

Ignacio Vazquez-Abrams
  • 776,304
  • 153
  • 1,341
  • 1,358
  • why does `guess_extension` return `*.jpe` for mimetype `image/jpeg`, does it simply use split on '/' and then returns the first 3 characters of the second element prefixed with a `.` ?? – Amyth Jun 01 '13 at 11:37
  • @Amyth: It does have "guess" in the name, so a bit of leeway is expected. – Ignacio Vazquez-Abrams Jun 01 '13 at 11:38
  • Hmm, get it, but I guess, `return '.' + mime.split('/')[1]` would return more accurate results then ? – Amyth Jun 01 '13 at 11:41
  • 2
    No, because `text/plain` should not have `.pla` as an extension. – Ignacio Vazquez-Abrams Jun 01 '13 at 11:42
  • 1
    @Amyth: the reason it returns `.jpe` is because `*.jpe` is one of many valid extensions for a mimetype of `image/jpeg`. If you use [mimetypes.guess_all_extensions()](https://docs.python.org/2/library/mimetypes.html#mimetypes.guess_all_extensions) instead, you'll see the entire list of possibilities. It seems like [mimetypes.guess_extension()](http://docs.python.org/2/library/mimetypes.html#mimetypes.guess_extension) just takes the first element of this list. This is also the reason guessing the mimetype of `text/plain` returns `.h` when `.txt` is the obvious choice. – pR0Ps Sep 13 '16 at 14:50
3

It can be done if you have an oracle that determines file types from their content. Happily at least one such oracle is already implemented in Python: https://github.com/ahupp/python-magic

John Zwinck
  • 239,568
  • 38
  • 324
  • 436
0

The below code worked for me :

import filetype

fileinfo = filetype.guess(mock.jpg) #the argument can be buffer/file
detectedExt = fileinfo.extension
detectedmime = fileinfo.mime

filetype package documentation

Pavithra B
  • 141
  • 2
  • 6