0

There are a few years since this question was asked before and previous answers such as,

How to reliably detect file types?

Shows some old libraries, of which most of them no longer seems to be maintained. So I figured this question deserves to be asked again.

What I would need is a good way to identify the type of a file based on content. Something like the file command in Linux. In particular I am interesting in type detection for crypto files.

I file may called client.crt, but the actual content uses the PEM format, so file extension is not really usable here. Further, for Linux, file extension is optional.

Apache TIKA seems to be developed still, but the size of it is about 75MB. Further, it seems to have a limited support for crypto files.

I understand this is a complicated thing to do and the file command is a quite advanced tool. Anyone who knows how to deal with this in a good way?

BR Patrik

EDIT

Looks as if this question have gotten a close vote already. I am not really looking for a library which can do this. Code examples using the Java standard library is good enough, but this is too tedious to do from scratch (hence the absence of a code example). A good code example would be great, which makes it fit well on stack overflow.

patrik
  • 4,506
  • 6
  • 24
  • 48
  • Take a look at `Files.probeContentType`, it might help you. – Marcos Barbero Mar 04 '20 at 16:55
  • @MarcosBarbero I have tried this already and it fails on PEM files. – patrik Mar 04 '20 at 16:56
  • 1
    It indeed doesn't recognize all file types, you can add your own whenever needed. https://stackoverflow.com/questions/32863030/adding-file-types-to-be-recognized-by-files-probecontenttypenew-file-ttf-to?answertab=active#tab-top – Marcos Barbero Mar 04 '20 at 16:57
  • You can run Linux commands using Java as well. Perhaps the Linux file command does all that you need? – Sir Beethoven Mar 04 '20 at 17:05
  • @SirBeethoven This is one way to do it and also something I consider. The problem is it will fail on windows, but it might be possible to use extension checking as a fallback. – patrik Mar 04 '20 at 17:09
  • @MarcosBarbero's solution is the best way (by a long-shot) of handling this 'using the java standard library'. – BeUndead Mar 04 '20 at 17:20
  • @MarcosBarbero This is likely an option. Not straight forward though, but I suppose it is not too hard to do something like this, https://resources.infosecinstitute.com/hiding-malware-in-certificates/#gref – patrik Mar 04 '20 at 17:49

0 Answers0