I want to classify file types based on their extensions in python.Before writing it up myself i wanted to check if there is any python package which can be used for this purpose. By file type i mean to classify it as eg. Doc,ppt,pdf,tar,txt,iso etc. ideally it would take the file name as input and return its type.i am running on linux
Asked
Active
Viewed 655 times
1
-
A file's extension has nothing to do with its type. – Burhan Khalid Sep 04 '12 at 06:48
-
3Take a look at this question: http://stackoverflow.com/questions/43580/how-to-find-the-mime-type-of-a-file-in-python . You can *guess* by extension using `mimetypes`, but something like the `python-magic` (mentioned in the second answer) may be more reliable. – kenm Sep 04 '12 at 06:51
-
Not *nothing* (you hope they're related), but they are definitely not the same thing. Eg., You can totally change the extension of a `.jpg` to a `.doc`, but the type is still jpeg. – Matthew Adams Sep 04 '12 at 06:53
-
i just want to classify based on what the extension says. Not bothered about the actual content of the file. Any help now? – auny Sep 04 '12 at 06:57
2 Answers
2
You should look into a document metadata parser. I have used Apache Tika which is a java library in some of my projects. You can look at this question Python-based document metadata parser? to see how to use it in Python

Community
- 1
- 1

Pratik Mandrekar
- 9,362
- 4
- 45
- 65
1
In Linux you can use 'file' utillity which determine file type. So if you want you can use it and in your scripts too:
import subprocess
subprocess.call(['file', 'yourfile'])

Denis
- 7,127
- 8
- 37
- 58
-
1Command 'file' uses libmagic library, there is a 'python-magic' module that provides native interface and uses the same logic. – neutrinus Mar 13 '13 at 15:57