2

Hi everybody who is kind enough to read this.

Im working with a Java program, i didnt make it, but im refining it , the problem it's that you can add files , but i want to validate so the added files are not compressed in any human known format so i dont want the people to be able to add a zip file or rar or 7z or gz , an so on.

can anyone help me with a idea, it's this even possible?

thanks in advance.

*Edit: The program its used by IT students , they add the files (.java,.class, .php, .doc, .mdb) of their source code, the paths are saved in strings, and at the end, the program zips the files, and send them to the teacher, know the instructor doesn't want to receive zipped or compressed files, that's the reason of the validation.

Ccortina
  • 587
  • 14
  • 31
  • Please check this question, seems the same [http://stackoverflow.com/questions/4148987/checking-if-a-stream-is-a-zip-file][1] [1]: http://stackoverflow.com/questions/4148987/checking-if-a-stream-is-a-zip-file – bhatanant2 Sep 10 '12 at 15:59
  • add files to where? is it a physical file? can you just check the first few bytes of the file, e.g. zip file begins with PK – gigadot Sep 10 '12 at 15:59
  • Is the purpose that you only want to have text files? Can you add binary files which are not compressed? A generic way to detect a compressed file is to compress it and see if it is any smaller. If the file is non-trival but no smaller, it has been compressed already. e.g. a PNG file is compressed. – Peter Lawrey Sep 10 '12 at 15:59
  • 1
    @PeterLawrey, compressing whole files usually sucks as it's quite CPU/memory intense operation. You can just pick few 4/8k blocks and see it they get compressed. – bestsss Sep 10 '12 at 16:05
  • Can you validate based on the file extension alone or do you want to be able to send a file like `file.zip` if its not actually a compressed file? – Peter Lawrey Sep 10 '12 at 16:09
  • This sounds like a half-baked solution to the wrong problem now getting fully baked. Probably the teacher didn't have four types of decompression programs, and didn't like the bother of having to figure out how to unpack files. Perhaps instead of making submission even worse for the class, why not just decompress the files before sending them to him? Or actually trying to get better requirements instead of "just do it this way" requirements? – Edwin Buck Sep 10 '12 at 16:11
  • Well the main problem it's that the students are extremly lazy, and we can control them at their home , so we want to make the program dumb-proof, also the program was made a group of students as a class project – Ccortina Sep 10 '12 at 16:14
  • If the students are taking the _extra_ step of compressing their files, it doesn't sound like the students are being lazy. Your teacher is either not stressing the importance of how he likes the submissions, or he's being lazy by not unpacking them. Poor management, even in schools, always blames the non-management. Don't play that up, fix it in a way that betters everyone. – Edwin Buck Sep 10 '12 at 16:19

3 Answers3

4

You basically do the java equivalent of the unix command type on the file's bytes. Most files have an embedded fingerprint which gives hints to other programs as to what type of file it is. This fingerprint is typically called a "magic number"

7zip - '7', 'z', 0xBC, 0xAF, 0x27, 0x1C
gzip - 0x1F, 0x8B

One (incomplete) list of magic numbers can be found here.

Some files don't have magic numbers, in which case you have to look for other common items in the file which strongly hints it's a file of the suspected type.

Relying on file name extensions will just have everyone eventually renaming the extension.

Cody Gray - on strike
  • 239,200
  • 50
  • 490
  • 574
Edwin Buck
  • 69,361
  • 7
  • 100
  • 138
1

Most comressed file types have a "magic number" in the beginning, a few bytes that indicate the type of file (not only compressed files, also images etc). You can check the file contents against known file types. You can google on "magic number file type".

Roger Lindsjö
  • 11,330
  • 1
  • 42
  • 53
0

FWIW, this function checks if a file is gzipped:

public static boolean isGzipped(File f) {
    InputStream is = null;
    try {
        is = new FileInputStream(f);
        byte [] signature = new byte[2];
        int nread = is.read( signature ); //read the gzip signature
        return nread == 2 && signature[ 0 ] == (byte) 0x1f && signature[ 1 ] == (byte) 0x8b;
    } catch (IOException e) {
        Log.x(e);
        return false;
    } finally {
        Closer.closeSilently(is);
    }
}

See Closer.closeSilently() here.

Community
  • 1
  • 1
18446744073709551615
  • 16,368
  • 4
  • 94
  • 127