1

I wrote code in nodejs to decompress different file types (like tar, tar.gz etc..)

I do not have the filename available to me.

Currently I use brute force to decompress. The first one that succeeds, wins..

I want to improve this by knowing the compression type beforehand. Is there a way to do this?

leppie
  • 115,091
  • 17
  • 196
  • 297
guy mograbi
  • 27,391
  • 16
  • 83
  • 122
  • possible duplicate of [How to detect type of compression used on the file? (if no file extension is specified)](http://stackoverflow.com/questions/19120676/how-to-detect-type-of-compression-used-on-the-file-if-no-file-extension-is-spe) – Mels Jan 19 '15 at 12:45
  • @Mels Answers involve hardcoding constants, hoping no new will be introduced, or using OS-specific tools. It is like 5th time over a year I come across people needing a *file* module in Node.js, but there seems to be none. – alandarev Jan 19 '15 at 12:51
  • @alandarev There *has* been a [libmagic/libfile binding for node](https://github.com/mscdex/mmmagic) for quite some time. – mscdex Jan 19 '15 at 12:57
  • @mscdex Excellent. For some reason I though it was relying on the extension extracted from filename. – alandarev Jan 19 '15 at 13:03

1 Answers1

4

Your "brute force" approach would actually work very well, since the software would determine incredibly quickly, usually within the first few bytes, that it had been handed the wrong thing. Except for the one that will work.

You can see this answer for a list of prefix bytes for common formats. You would also need to detect the tar format within a compressed format, which is not detailed there. Even if you find a matching prefix, you still need to proceed to decompress and decode to test the hypothesis, which is essentially your brute force method.

Community
  • 1
  • 1
Mark Adler
  • 101,978
  • 13
  • 118
  • 158