This is from man file
(here), for the unix file
command:
The magic tests are used to check for files with data in particular fixed formats. The canonical example of this is a binary executable ... These files have a “magic number” stored in a particular place near the beginning of the file that tells the UNIX operating system that the file is a binary executable, and which of several types thereof. The concept of a “magic” has been applied by extension to data files. Any file with some invariant identifier at a small fixed offset into the file can usually be described in this way. ...
(my emphasis)
And here's one example of the "magic" that the file
command uses to identify an svg
file (see source for more):
...
0 string \<?xml\ version=
>14 regex ['"\ \t]*[0-9.]+['"\ \t]*
>>19 search/4096 \<svg SVG Scalable Vector Graphics image
...
0 string \<svg SVG Scalable Vector Graphics image
...
As described by man magic
, each line follows the format <offset> <type> <test> <message>
.
If I understand correctly, the code above looks for the literal "<?xml version="
. If that is found, it looks for a version number, as described by the regular expression. If that is found, it searches the next 4096 bytes until it finds the literal "<svg"
. If any of this fails, it looks for the literal "<svg"
at the start of the file, and so on.
Something similar could be implemented in Python.
Note there's also python-magic, which provides an interface to libmagic
, as used by the unix file
command.