21

I want to have different behavior in a python script, depending on the type of file. I cannot use the filename extension as it may not be present or misleading. I could call the file utility and parse the output, but I would rather use a python builtin for portability.

So is there anything in python that uses heuristics to deduce the type of the file from its contents?

0x89
  • 2,940
  • 2
  • 31
  • 30

1 Answers1

18

Probably others as well. "magic" is the magic keyword to search for. ;-)

FabienAndre
  • 4,514
  • 25
  • 38
Alex Brasetvik
  • 11,218
  • 2
  • 35
  • 36
  • `libmagic` isn't perfect for all files. It looks at the "magic number" in a file header. Text files, such as source code, don't have headers and libmagic has to resort to wild guessing ... it can be very wrong about them. – Jochen Ritzel Dec 29 '09 at 14:02
  • 1
    Such is the danger of all content-sniffing approaches. Often the number of ‘acceptable’ file types is smaller than the list known by libmagic, in which case ad-hoc app-level sniffing can be a better bet, but for the general case there's not much you can do about it. – bobince Dec 29 '09 at 14:12
  • 5
    libmagic is what file uses, so it's very, very hard to find a closer match to file. – Ignacio Vazquez-Abrams Dec 29 '09 at 16:07
  • just as a note, four years later, pymagic looks like no longer maintained, whereas python-magic is still well alive. Cf [duplicate answer](http://stackoverflow.com/a/21499463/1290438) – zmo Feb 01 '14 at 14:54
  • 1
    Update 2014: Both of these are dead. I think [filemagic](https://pypi.python.org/pypi/filemagic/) is the current library for this functionality. – jkitchen Feb 17 '14 at 16:50
  • 3
    Update 2014: My bad. [python-magic](https://github.com/ahupp/python-magic) is alive and well. – jkitchen Feb 17 '14 at 17:04