0

The situation is this: in my code I download a picture from the Internet as an array of bytes: byte[] imageBytes. And then I make the object of my MyImage class:

public class MyImage {

    private String imageName;
    private byte[] data;

    // Constructor, getters, setters
}

Create object:

MyImage myimg = new MyImage("some_name", imageBytes);

Then I use my resulting object.

I have a question: can I somehow make sure, that I downloaded the picture file?

It seems that each file type has its own signature (that is, the first bytes of the file). But there are a lot of different types of files. And if I compare the signature of my file with all existing types of signatures, then these will be very large checks.

Is it possible to make the check simpler: do I currently have an image (no matter what type) or not image?

Why do I want to do this: what if the user mistakenly passes a link not to an image, but to a file of a different type (for example, an archive). And in this case, I would like to determine in the code that the wrong file was downloaded and give the user an appropriate error message.

alex_t
  • 71
  • 1
  • 6
  • 3
    It is not possible to detect whether or not a file is an image *of any type*. Certain image types are indistinguishable from files that are not images; e.g. an image that has been compressed using a generic compressor or encrypted using a generic encryptor cannot be *reliably* distinguished from a compressed / encrypted non-image file. (And file signatures are not a guarantee either.) – Stephen C May 06 '22 at 05:19
  • 2
    There is no easy way to check if a file is "an image" without attempting to parse the file in some image library/method and see if it returns a valid image. The best you can do is to catch the most common formats (JPEG, PNG, TIFF, BMP, etc) and as you say, you can often find if they are one of these file types by reading the first few bytes of a file. For example the first eight bytes of a PNG file always contain the following (decimal) values `137 80 78 71 13 10 26 10`, and a similar thing is true of most common file types, you just need to lookup the specification for each format. – sorifiend May 06 '22 at 05:23
  • 2
    The way to deal with this is to do file signature detection for a specific set of image types ... not all images "no matter what type". There are libraries that do file signature detection. – Stephen C May 06 '22 at 05:25
  • @StephenC , can you suggest which java-library should i use ? (for file signature detection) – alex_t May 06 '22 at 06:23
  • Try to turn it into an image with `javax.imageio.ImageIO`. – user207421 May 06 '22 at 07:22
  • 2
    As your application downloads images (assumably using HTTP(s)) from the internet, I think inspecting the `Content-Type` header makes sense. Images should be served with `image/` MIME type. Of course, this is not 100% reliable, as servers *may* give the wrong mime type, or may serve images with `application/octet-stream` or similar. But it's a lot "cheaper" than inspecting the bytes. And you may even avoid the download entirely, in cases where it's not an image. – Harald K May 06 '22 at 07:39
  • Perhaps you could even use the HTTP mechanics for this, and specifying an [`Accept` header](https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Accept), like `Accept: image/*` in your download request. The server *should* then respond with either an image or an error. Unfortunately, I'm not sure if this will work reliably (probably due to historical browser issues). – Harald K May 06 '22 at 07:47

1 Answers1

2

One way could be to check the Content-Type header that is sent from the server you download from. Another one could be the same algorithm that is used by the Linux file command. It would be tedious to reimplement that in Java.

So I suggest to either

  • check the Content-Type header
  • assuming you are on a *nix system: store the data in a file, then exec file and get the result
  • just try to parse the image and let the graphics library decide. If successful, check the image you obtained
  • some libraries are listed in How to reliably detect file types?
  • The most promising way would be to ask the JDK via probeContentType()
Queeg
  • 7,748
  • 1
  • 16
  • 42
  • 1
    I like the idea of checking the content type header! As the OP downloads a file from the internet, this information should be easily available. And it certainly makes the check easier, as all you need to do is test if the type starts with `"image/"`. However, I think this answer can be improved to show a PoC of this with Java code. Currently, it's more like a "Have you tried this?"-type comment. – Harald K May 06 '22 at 07:26
  • The magic element in the Unix `file` command is not merely an algorithm unfortunately. It relies on on a self-contained database of file signatures which is periodically updated. As far as I know there's no proper Java implementation but output from Unix `file` could be handled through `ProcessBuilder` – g00se May 06 '22 at 08:15
  • 2
    Well, there is some danger with the Content-Type. It is easy to send either a wrong or generic content type, and typical clients will still process the content correctly. We do not know how reliable that internet download thingy is. Hence I propose to do a local check. And the most efficient yet portable one would be the last one I mentioned. – Queeg May 06 '22 at 11:35