Great question!
If all you want to check for is a RAR or ZIP file appended to the end of an
image file, then running it through the unrar
or unzip
command is the
easiest way to do it.
If you want a faster but less exact check, you can check for some of the
special file format signatures that indicate certain types of files. The
usual UNIX tool to identify file format is file
. It uses a
database of binary file signatures, whose format is
defined in the magic(5) man page. It won’t find a RAR file for
you at the end of a JPEG, because it only looks at the start of files to
try to identify them quickly, but you might be able to modify its source code
to do what you want. You could also reuse its database of file signatures. If you look at the archive file part of its database in the Rar files section, it shows this:
# RAR archiver (Greg Roelofs, newt@uchicago.edu)
0 string Rar! RAR archive data,
which indicates that if your JPEG file contains the four bytes Rar!
that
would be suspicious. But you would have to examine the Rar file format
spec in detail to check whether more of the Rar file structure is
present to avoid false positives—this web page also contains the four bytes
Rar!
but there are no hidden files attached to it :P
But if someone knows the details of your automated checks, they could
easily work around them. The simplest workaround would be to reverse all the bytes
of the files before appending them to the JPEG. Then none of your
signatures would catch the reversed version of the file.
If someone really wants to hide a file inside an image, there are all sorts
of ways to do that that you won’t be able to detect easily. The general
term for this is “steganography.” The Wikipedia page, for
example, shows a picture of trees that has a picture of a cat hidden inside
it. For simpler steganographic methods, there are statistical tests that
can indicate something funny has been done to a picture, but if someone
spends a lot of time to come up with their own method to hide other files
inside images, you won’t be able to detect it.