2

I have a simple problem: in a system I'm developing the user can send us zipfiles and I need to filter the content of it. (block applications and malicious scripts)

To block the inner files by extension is easy, but files without extension are very common and the extension isn't the most reliable source about the content of the file.

I've already tried to use python magic, but it requires some packages that my server doesn't support and the server isn't going to help me. Oh! I don't have the option of changing the system to another server. So, there's no python magic for me in this case.

Does anyone have an idea of how to check the file type by its header?

JasonMArcher
  • 14,195
  • 22
  • 56
  • 52
Jayme Tosi Neto
  • 1,189
  • 2
  • 19
  • 41

1 Answers1

2

Not a direct answer, but the format of /etc/magic is not that complicated, so if its only a few filetypes you need to detect, perhaps its easiest to write your own detection routine.

# Java

0       beshort         0xcafe
>2      beshort         0xbabe          application/x-java-applet

we get:

data = open(path).read()
if data[0:4] == '\xca\xfe\xba\xbe':
    minetype = 'application/x-java-applet'
Mr Shark
  • 26,068
  • 5
  • 29
  • 37
  • 1
    You really only need to read 4 bytes, so you could just do `data = yourfile.open(path).read(4)`, rather than trying to decompress and read the file in its entirety. – Paul Fisher Feb 02 '11 at 12:54
  • @Paul: yes, of coures we can make those optimizations but that is left as an exercise to the reader :-) – Mr Shark Feb 02 '11 at 13:13
  • @Jayme: What is the problem with this solution? – Mr Shark Feb 02 '11 at 13:14
  • I didn't understand you answer! ;P But the code you put here '\xca\xfe\xba\xbe', where did you get it? I'm looking over many places and only found codes for windows formats... =/ – Jayme Tosi Neto Feb 02 '11 at 13:20
  • The cafebabe is the hex-values that starts any java class file. – Mr Shark Feb 02 '11 at 14:01
  • 1
    I found it in the file /usr/share/misc/magic.mime on my Ubuntubox, should be goolable. man magic will give you the format – Mr Shark Feb 02 '11 at 14:05